This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Integration

Integrate with other projects

1 - Open WebUI

Integrate with Open WebUI and get the web UI for the AI assistant.

Open WebUI provides a web UI that works with OpenAI-compatible APIs. You can run Openn WebUI locally or run in a Kubernetes cluster.

Here is an instruction for running Open WebUI in a Kubernetes cluster.

OPENAI_API_KEY=<LLMariner API key>
OPEN_API_BASE_URL=<LLMariner API endpoint>

kubectl create namespace open-webui
kubectl create secret generic -n open-webui llmariner-api-key --from-literal=key=${OPENAI_API_KEY}

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui
  namespace: open-webui
spec:
  selector:
    matchLabels:
      name: open-webui
  template:
    metadata:
      labels:
        name: open-webui
    spec:
      containers:
      - name: open-webui
        image: ghcr.io/open-webui/open-webui:main
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        env:
        - name: OPENAI_API_BASE_URLS
          value: ${OPEN_API_BASE_URL}
        - name: WEBUI_AUTH
          value: "false"
        - name: OPENAI_API_KEYS
          valueFrom:
            secretKeyRef:
              name: llmariner-api-key
              key: key
---
apiVersion: v1
kind: Service
metadata:
  name: open-webui
  namespace: open-webui
spec:
  type: ClusterIP
  selector:
    name: open-webui
  ports:
  - port: 8080
    name: http
    targetPort: http
    protocol: TCP
EOF

You can then access Open WebUI with port forwarding:

kubectl port-forward -n open-webui service/open-webui 8080

2 - Continue

Integrate with Continue and provide an open source AI code assistant.

Continue provides an open source AI code assistant. You can use LLMariner as a backend endpoint for Continue.

As LLMariner provides the OpenAI compatible API, you can set the provider to "openai". apiKey is set to an API key generated by LLMariner, and apiBase is set to the endpoint URL of LLMariner (e.g., http://localhost:8080/v1).

Here is an example configuration that you can put at ~/.continue/config.json.

{
  "models": [
    {
      "title": "Meta-Llama-3.1-8B-Instruct-q4",
      "provider": "openai",
      "model": "meta-llama-Meta-Llama-3.1-8B-Instruct-q4",
      "apiKey": "<LLMariner API key>",
      "apiBase": "<LLMariner endpoint>"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Auto complete",
    "provider": "openai",
    "model": "deepseek-ai-deepseek-coder-6.7b-base-q4",
    "apiKey": "<LLMariner API key>",
    "apiBase": "<LLMariner endpoint>",
    "completionOptions": {
      "presencePenalty": 1.1,
      "frequencyPenalty": 1.1
    },
  },
  "allowAnonymousTelemetry": false
}

The following is a demo video that shows the Continue integration that enables the coding assistant with Llama-3.1-Nemotron-70B-Instruct.

3 - Aider

Integrate with Aider for AI pair programming

Aider is AI pair programming in your terminal or browser.

Aider supports the OpenAI compatible API, and you can configure the endpoint and the API key with environment variables.

Here is an example installation and configuration procedure.

python -m pip install -U aider-chat

export OPENAI_API_BASE=<Base URL (e.g., http://localhost:8080/v1)>
export OPENAI_API_KEY=<API key>

You can then run Aider in your terminal or browser. Here is an example command that launches Aider in your browser with Llama 3.1 70B.

<Move to your github repo directory>

aider --model openai/meta-llama-Meta-Llama-3.1-70B-Instruct-awq --browser

Please note that the model name requires the openai/ prefix.

https://aider.chat/examples/README.html has example chat transcripts for building applications (e.g., “make a flask app with a /hello endpoint that returns hello world”).

4 - AI Shell

Integrate with AI Shell to power your shell with the AI assistant.

AI Shell is an open source tool that converts natural language to shell commands.

npm install -g @builder.io/ai-shell
ai config set OPENAI_API_ENDPOINT=<Base URL (e.g., http://localhost:8080/v1)>
ai config set OPENAI_KEY=<API key>
ai config set MODEL=<model name>

Then you can run the ai command and ask what you want in plain English and generate a shell command with a human readable explanation of it.

ai what is my ip address

5 - k8sgpt

Integrate with k8sgpt to diagnose and triage issues in your Kubernetes clusters.

k8sgpt is a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English.

You can use LLMariner as a backend of k8sgpt by running the following command:

k8sgpt auth add \
  --backend openai \
  --baseurl <LLMariner base URL (e.g., http://localhost:8080/v1/) \
  --password <LLMariner API Key> \
  --model <Model ID>

Then you can a command like k8sgpt analyze to inspect your Kubernetes cluster.

k8sgpt analyze --explain

6 - Dify

Integrate with Dify for LLM application development.

Dify is is an open-source LLM app development platform. It can orchestrate LLM apps from agents to complex AI workflows, with an RAG engine.

You can add LLMariner as one of Dify’s model providers with the following steps:

  1. Click the user profile icon.
  2. Click “Settings”
  3. Click “Model Provider”
  4. Search “OpenAI-API-compatible” and click “Add model”
  5. Configure a model name, API key,a nd API endpoint URL.

You can then use the registered model from your LLM applications. For example, you can create a new application by “Create from Template” and replace the use of an OpenAI model with the configured model.

If you want to deploy Dify in your Kubernetes clusters, follow README.md in the Dify GitHub repository.

7 - Slackbot

Build a Slackbot that integrates with LLMariner

You can build a Slackbot that is integrated with LLMariner. The bot can provide a chat UI with Slack and answer questions from end users.

An example implementation can be found in https://github.com/llmariner/slackbot. You can deploy it in your Kubernetes clusters and build a Slack app with the following configuration:

  • Create an app-level token whose scope is connections:write.
  • Enable the socket mode. Enable event subscription with the app_mentions:read scope.
  • Add the following scopes in “OAuth & Permissions”: app_mentions:read, chat:write, chat:write.customize, and links:write

You can install the Slack application to your workspace and interact.

8 - MLflow

Integrate with MLflow.

MLflow is an open-source tool for managing the machine learning lifecycle. It has various features for LLMs (link) and integration with OpenAI. We can apply these MLflow features to the LLM endpoints provided by LLMariner.

For example, you can deploy a MLflow Deployments Server for LLMs and use Prompt Engineering UI.

Deploying MLflow Tracking Server

Bitmani provides a Helm chart for MLflow.

helm upgrade \
  --install \
  --create-namespace \
  -n mlflow \
  mlflow oci://registry-1.docker.io/bitnamicharts/mlflow \
  -f values.yaml

An example values.yaml is following:

tracking:
  extraEnvVars:
  - name: MLFLOW_DEPLOYMENTS_TARGET
    value: http://deployment-server:7000

We set MLFLOW_DEPLOYMENTS_TARGET to the address of a MLflow Deployments Server that we will deploy in the next section.

Once deployed, you can set up port-forwarding and access http://localhost:9000.

kubectl port-forward -n mlflow service/mlflow-tracking 9000:80

The login credentials are obtained by the following commands:

# User
kubectl get secret --namespace mlflow mlflow-tracking -o jsonpath="{ .data.admin-user }" | base64 -d
# Password
kubectl get secret --namespace mlflow mlflow-tracking -o jsonpath="{.data.admin-password }" | base64 -d

Deploying MLflow Deployments Server for LLMs

We have an example K8s YAML for deploying a MLflow deployments server here.

You can save it locally, up openai_api_base in the ConfigMap definition based on your ingress controller address, and then run:

kubectl create secret generic -n mlflow llmariner-api-key \
  --from-literal=secret=<Your API key>

kubectl apply -n mlflow -f deployment-server.yaml

You can then access the MLflow Tracking Server, click "New run", and choose "using Prompt Engineering".

Other Features

Please visit MLflow page for more information for other LLM related features provided by MLflow.

9 - Langfuse

Integrate with Langfuse for LLM engineering.

Langfuse is an open source LLM engineering platform. You can integrate Langfuse with LLMariner as Langfuse provides an SDK for the OpenAI API.

Here is an example procedure for running Langfuse locally:

git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d

You can sign up and create your account. Then you can generate API keys and put them in environmental variables.

export LANGFUSE_SECRET_KEY=...
export LANGFUSE_PUBLIC_KEY=...
export LANGFUSE_HOST="http://localhost:3000"

You can then use langfuse.openai instead of openai in your Python scripts to record traces in Langfuse.

from langfuse.openai import openai

client = openai.OpenAI(
  base_url="<Base URL (e.g., http://localhost:8080/v1)>",
  api_key="<API key secret>"
)

completion = client.chat.completions.create(
  model="google-gemma-2b-it-q4_0",
  messages=[
    {"role": "user", "content": "What is k8s?"}
  ],
  stream=True
)
for response in completion:
  print(response.choices[0].delta.content, end="")
print("\n")

Here is an example screenshot.

10 - Weights & Biases (W&B)

Integration with W&B and see the progress of your fine-tuning jobs.

Weights and Biases (W&B) is an AI developer platform. LLMariner provides the integration with W&B so that metrics for fine-tuning jobs are reported to W&B. With the integration, you can easily see the progress of your fine-tuning jobs, such as training epoch, loss, etc.

Please take the following steps to enable the integration.

First, obtain the API key of W&B and create a Kubernetes secret.

kubectl create secret generic wandb
  -n <fine-tuning job namespace> \
  --from-literal=apiKey=${WANDB_API_KEY}

The secret needs to be created in a namespace where fine-tuning jobs run. Individual projects specify namespaces for fine-tuning jobs, and the default project runs fine-tuning jobs in the "default" namespace.

Then you can enable the integration by adding the following to your Helm values.yaml and re-deploying LLMariner.

job-manager-dispatcher:
  job:
    wandbApiKeySecret:
      name: wandb
      key: apiKey

A fine-tuning job will report to W&B when the integration parameter is specified.

job = client.fine_tuning.jobs.create(
  model="google-gemma-2b-it",
  suffix="fine-tuning",
  training_file=tfile.id,
  validation_file=vfile.id,
  integrations=[
    {
      "type": "wandb",
      "wandb": {
         "project": "my-test-project",
      },
    },
  ],
)

Here is an example screenshot. You can see metrics like train/loss in the W&B dashboard.