This is the multi-page printable view of this section. Click here to print.
Integration
- 1: Open WebUI
- 2: Continue
- 3: MLflow
- 4: Weights & Biases (W&B)
1 - Open WebUI
Open WebUI provides a web UI that works with OpenAI-compatible APIs. You can run Openn WebUI locally or run in a Kubernetes cluster.
Here is an instruction for running Open WebUI in a Kubernetes cluster.
OPENAI_API_KEY=<LLMariner API key>
OPEN_API_BASE_URL=<LLMariner API endpoint>
kubectl create namespace open-webui
kubectl create secret generic -n open-webui llmariner-api-key --from-literal=key=${OPENAI_API_KEY}
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: open-webui
namespace: open-webui
spec:
selector:
matchLabels:
name: open-webui
template:
metadata:
labels:
name: open-webui
spec:
containers:
- name: open-webui
image: ghcr.io/open-webui/open-webui:main
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: OPENAI_API_BASE_URLS
value: ${OPEN_API_BASE_URL}
- name: WEBUI_AUTH
value: "false"
- name: OPENAI_API_KEYS
valueFrom:
secretKeyRef:
name: llmariner-api-key
key: key
---
apiVersion: v1
kind: Service
metadata:
name: open-webui
namespace: open-webui
spec:
type: ClusterIP
selector:
name: open-webui
ports:
- port: 8080
name: http
targetPort: http
protocol: TCP
EOF
You can then access Open WebUI with port forwarding:
kubectl port-forward -n open-webui service/open-webui 8080
2 - Continue
Continue provides an open source AI code assistant. You can use LLMariner as a backend endpoint for Continue.
As LLMariner provides the OpenAI compatible API, you can set the provider
to "openai"
. apiKey
is set to an API key generated by LLMariner, and apiBase
is set to the endpoint URL of LLMariner (e.g., http://localhost:8080/v1).
Here is an example configuration that you can put at ~/.continue/config.json
.
{
"models": [
{
"title": "Meta-Llama-3.1-8B-Instruct-q4",
"provider": "openai",
"model": "meta-llama-Meta-Llama-3.1-8B-Instruct-q4",
"apiKey": "<LLMariner API key>",
"apiBase": "<LLMariner endpoint>"
}
],
"tabAutocompleteModel": {
"title": "Auto complete",
"provider": "openai",
"model": "deepseek-ai-deepseek-coder-6.7b-base-q4",
"apiKey": "<LLMariner API key>",
"apiBase": "<LLMariner endpoint>",
"completionOptions": {
"presencePenalty": 1.1,
"frequencyPenalty": 1.1
},
},
"allowAnonymousTelemetry": false
}
The following is a demo video that shows the Continue integration that enables the coding assistant with Llama-3.1-Nemotron-70B-Instruct.
3 - MLflow
MLflow is an open-source tool for managing the machine learning lifecycle. It has various features for LLMs (link) and integration with OpenAI. We can apply these MLflow features to the LLM endpoints provided by LLMariner.
For example, you can deploy a MLflow Deployments Server for LLMs and use Prompt Engineering UI.
Deploying MLflow Tracking Server
Bitmani provides a Helm chart for MLflow.
helm upgrade \
--install \
--create-namespace \
-n mlflow \
mlflow oci://registry-1.docker.io/bitnamicharts/mlflow \
-f values.yaml
An example values.yaml
is following:
tracking:
extraEnvVars:
- name: MLFLOW_DEPLOYMENTS_TARGET
value: http://deployment-server:7000
We set MLFLOW_DEPLOYMENTS_TARGET
to the address of a MLflow Deployments Server that we will deploy in the next section.
Once deployed, you can set up port-forwarding and access http://localhost:9000.
kubectl port-forward -n mlflow service/mlflow-tracking 9000:80
The login credentials are obtained by the following commands:
# User
kubectl get secret --namespace mlflow mlflow-tracking -o jsonpath="{ .data.admin-user }" | base64 -d
# Password
kubectl get secret --namespace mlflow mlflow-tracking -o jsonpath="{.data.admin-password }" | base64 -d
Deploying MLflow Deployments Server for LLMs
We have an example K8s YAML for deploying a MLflow deployments server here.
You can save it locally, up openai_api_base
in the ConfigMap
definition based on your ingress controller address, and then run:
kubectl create secret generic -n mlflow llmariner-api-key \
--from-literal=secret=<Your API key>
kubectl apply -n mlflow -f deployment-server.yaml
You can then access the MLflow Tracking Server, click "New run", and choose "using Prompt Engineering".
Other Features
Please visit MLflow page for more information for other LLM related features provided by MLflow.
4 - Weights & Biases (W&B)
Weights and Biases (W&B) is an AI developer platform. LLMariner provides the integration with W&B so that metrics for fine-tuning jobs are reported to W&B. With the integration, you can easily see the progress of your fine-tuning jobs, such as training epoch, loss, etc.
Please take the following steps to enable the integration.
First, obtain the API key of W&B and create a Kubernetes secret.
kubectl create secret generic wandb
-n <fine-tuning job namespace> \
--from-literal=apiKey=${WANDB_API_KEY}
The secret needs to be created in a namespace where fine-tuning jobs run. Individual projects specify namespaces for fine-tuning jobs, and the default project runs fine-tuning jobs in the "default" namespace.
Then you can enable the integration by adding the following to your Helm values.yaml
and re-deploying LLMariner.
job-manager-dispatcher:
job:
wandbApiKeySecret:
name: wandb
key: apiKey
A fine-tuning job will report to W&B when the integration
parameter is specified.
job = client.fine_tuning.jobs.create(
model="google-gemma-2b-it",
suffix="fine-tuning",
training_file=tfile.id,
validation_file=vfile.id,
integrations=[
{
"type": "wandb",
"wandb": {
"project": "my-test-project",
},
},
],
)
Here is an example screenshot. You can see metrics like train/loss
in the W&B dashboard.