Features

LLMariner features

Inference with Open Models

Users can run chat completion with open models such as Google Gemma, LLama, Mistral, etc. To run chat completion, users can use the OpenAI Python library, llma CLI, or API endpoint.

Model Loading

The following shows how to load models in LLMariner.

Retrieval-Augmented Generation (RAG)

This page describes how to use RAG with LLMariner.

Model Fine-tuning

This page describes how to fine-tune models with LLMariner.

General-purpose Training

LLMariner allows users to run general-purpose training jobs in their Kubernetes clusters.

Jupyter Notebook

LLMariner allows users to run a Jupyter Notebook in a Kubernetes cluster. This functionality is useful when users want to run ad-hoc Python scripts that require GPU.

API and GPU Usage Optimization

GPU Federation

Users can create a global pool of GPUs across multiple clusters and efficiently utilize them.

User Management

Describes the way to manage users

Access Control with Organizations and Projects

The way to configure access control using organizations and projects