Install in a Single On-premise Cluster
This page goes through the concrete steps to install LLMariner on a on-premise K8s cluster (or a local K8s cluster). You can skip some of the steps if you have already made necessary installation/setup.
Note
Installation of Postgres, MinIO, SeaweedFS, and Milvus are just example purposes, and they are not intended for the production usage. Please configure based on your requirements if you want to use LLMariner for your production environment.Step 1. Install Nvidia GPU Operator
Nvidia GPU Operator is required to install the device plugin and make GPU resources visible in the K8s cluster. Run:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm upgrade --install --wait \
--namespace nvidia \
--create-namespace \
gpu-operator nvidia/gpu-operator \
--set cdi.enabled=true \
--set driver.enabled=false \
--set toolkit.enabled=false
Step 2. Install an ingress controller
An ingress controller is required to route HTTP/HTTPS requests to the LLMariner components. Any ingress controller works, and you can skip this step if your EKS cluster already has an ingress controller.
Here is an example that installs Kong and make the ingress controller:
helm repo add kong https://charts.konghq.com
helm repo update
cat <<EOF > kong-values.yaml
proxy:
type: NodePort
http:
hostPort: 80
tls:
hostPort: 443
annotations:
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "300"
nodeSelector:
ingress-ready: "true"
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Equal
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Equal
effect: NoSchedule
fullnameOverride: kong
EOF
helm upgrade --install --wait \
--namespace kong \
--create-namespace \
kong-proxy kong/kong \
-f kong-values.yaml
Step 3. Install a Postgres database
Run the following to deploy an Postgres deployment:
export POSTGRES_USER="admin_user"
export POSTGRES_PASSWORD="secret_password"
helm upgrade --install --wait \
--namespace postgres \
--create-namespace \
postgres oci://registry-1.docker.io/bitnamicharts/postgresql \
--set nameOverride=postgres \
--set auth.database=ps_db \
--set auth.username="${POSTGRES_USER}" \
--set auth.password="${POSTGRES_PASSWORD}"
Set the environmental variables so that LLMariner can later access the Postgres database.
export POSTGRES_ADDR=postgres.postgres
export POSTGRES_PORT=5432
Step 4. Install an S3-compatible object store
LLMariner requires an S3-compatible object store such as MinIO or SeaweedFS.
First set environmental variables to specify installation configuration:
# Bucket name and the dummy region.
export S3_BUCKET_NAME=llmariner
export S3_REGION=dummy
# Credentials for accessing the S3 bucket.
export AWS_ACCESS_KEY_ID=llmariner-key
export AWS_SECRET_ACCESS_KEY=llmariner-secret
Then install an object store. Here are the example installation commands for MinIO and SeaweedFS.
helm upgrade --install --wait \
--namespace minio \
--create-namespace \
minio oci://registry-1.docker.io/bitnamicharts/minio \
--set auth.rootUser=minioadmin \
--set auth.rootPassword=minioadmin \
--set defaultBuckets="${S3_BUCKET_NAME}"
kubectl port-forward -n minio service/minio 9001 &
# Wait until the port-forwarding connection is established.
sleep 5
# Obtain the cookie and store in cookies.txt.
curl \
http://localhost:9001/api/v1/login \
--cookie-jar cookies.txt \
--request POST \
--header 'Content-Type: application/json' \
--data @- << EOF
{
"accessKey": "minioadmin",
"secretKey": "minioadmin"
}
EOF
# Create a new API key.
curl \
http://localhost:9001/api/v1/service-account-credentials \
--cookie cookies.txt \
--request POST \
--header "Content-Type: application/json" \
--data @- << EOF >/dev/null
{
"name": "LLMariner",
"accessKey": "$AWS_ACCESS_KEY_ID",
"secretKey": "$AWS_SECRET_ACCESS_KEY",
"description": "",
"comment": "",
"policy": "",
"expiry": null
}
EOF
rm cookies.txt
kill %1
kubectl create namespace seaweedfs
# Create a secret.
# See https://github.com/seaweedfs/seaweedfs/wiki/Amazon-S3-API#public-access-with-anonymous-download for details.
cat <<EOF > s3-config.json
{
"identities": [
{
"name": "me",
"credentials": [
{
"accessKey": "${AWS_ACCESS_KEY_ID}",
"secretKey": "${AWS_SECRET_ACCESS_KEY}"
}
],
"actions": [
"Admin",
"Read",
"ReadAcp",
"List",
"Tagging",
"Write",
"WriteAcp"
]
}
]
}
EOF
kubectl create secret generic -n seaweedfs seaweedfs --from-file=s3-config.json
rm s3-config.json
# deploy seaweedfs
cat << EOF | kubectl apply -n seaweedfs -f -
apiVersion: v1
kind: PersistentVolume
metadata:
name: seaweedfs-volume
labels:
type: local
app: seaweedfs
spec:
storageClassName: manual
capacity:
storage: 500Mi
accessModes:
- ReadWriteMany
hostPath:
path: /data/seaweedfs
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: seaweedfs-volume-claim
labels:
app: seaweedfs
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 500Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: seaweedfs
spec:
replicas: 1
selector:
matchLabels:
app: seaweedfs
template:
metadata:
labels:
app: seaweedfs
spec:
containers:
- name: seaweedfs
image: chrislusf/seaweedfs
args:
- server -s3 -s3.config=/etc/config/s3-config.json -dir=/data
ports:
- name: master
containerPort: 9333
protocol: TCP
- name: s3
containerPort: 8333
protocol: TCP
volumeMounts:
- name: seaweedfsdata
mountPath: /data
- name: config
mountPath: /etc/config
volumes:
- name: seaweedfsdata
persistentVolumeClaim:
claimName: seaweedfs-volume-claim
- name: config
secret:
secretName: seaweedfs
---
apiVersion: v1
kind: Service
metadata:
name: seaweedfs
labels:
app: seaweedfs
spec:
type: NodePort
ports:
- port: 9333
targetPort: master
protocol: TCP
name: master
nodePort: 31238
- port: 8333
targetPort: s3
protocol: TCP
name: s3
nodePort: 31239
selector:
app: seaweedfs
EOF
kubectl wait --timeout=60s --for=condition=ready pod -n seaweedfs -l app=seaweedfs
kubectl port-forward -n seaweedfs service/seaweedfs 8333 &
# Wait until the port-forwarding connection is established.
sleep 5
# Create the bucket.
aws --endpoint-url http://localhost:8333 s3 mb s3://${S3_BUCKET_NAME}
kill %1
Then set environmental variable S3_ENDPOINT_URL
to the URL of the object store. The URL should be accessible from LLMariner pods that will run on the same cluster.
export S3_ENDPOINT_URL=http://minio.minio:9000
export S3_ENDPOINT_URL=http://seaweedfs.seaweedfs:8333
Step 5. Install Milvus
Install Milvus as it is used a backend vector database for RAG.
cat << EOF > milvus-values.yaml
cluster:
enabled: false
etcd:
enabled: false
pulsar:
enabled: false
minio:
enabled: false
tls:
enabled: false
extraConfigFiles:
user.yaml: |+
etcd:
use:
embed: true
data:
dir: /var/lib/milvus/etcd
common:
storageType: local
EOF
helm repo add zilliztech https://zilliztech.github.io/milvus-helm/
helm repo update
helm upgrade --install --wait \
--namespace milvus \
--create-namespace \
milvus zilliztech/milvus \
-f milvus-values.yaml
Set the environmental variables so that LLMariner can later access the Postgres database.
export MILVUS_ADDR=milvus.milvus
Step 6. Install LLMariner
Run the following command to set up a values.yaml
and install LLMariner with Helm.
# Set the endpoint URL of LLMariner. Please change if you are using a different ingress controller.
export INGRESS_CONTROLLER_URL=http://localhost:8080
cat << EOF | envsubst > llmariner-values.yaml
global:
# This is an ingress configuration with Kong. Please change if you are using a different ingress controller.
ingress:
ingressClassName: kong
# The URL of the ingress controller. this can be a port-forwarding URL (e.g., http://localhost:8080) if there is
# no URL that is reachable from the outside of the EKS cluster.
controllerUrl: "${INGRESS_CONTROLLER_URL}"
annotations:
# To remove the buffering from the streaming output of chat completion.
konghq.com/response-buffering: "false"
database:
host: "${POSTGRES_ADDR}"
port: ${POSTGRES_PORT}
username: "${POSTGRES_USER}"
ssl:
mode: disable
createDatabase: true
databaseSecret:
name: postgres
key: password
objectStore:
s3:
endpointUrl: "${S3_ENDPOINT_URL}"
bucket: "${S3_BUCKET_NAME}"
region: "${S3_REGION}"
awsSecret:
name: aws
accessKeyIdKey: accessKeyId
secretAccessKeyKey: secretAccessKey
prepare:
database:
createSecret: true
secret:
password: "${POSTGRES_PASSWORD}"
objectStore:
createSecret: true
secret:
accessKeyId: "${AWS_ACCESS_KEY_ID}"
secretAccessKey: "${AWS_SECRET_ACCESS_KEY}"
dex-server:
staticPasswords:
- email: admin@example.com
# bcrypt hash of the string: $(echo password | htpasswd -BinC 10 admin | cut -d: -f2)
hash: "\$2a\$10\$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
username: admin-user
userID: admin-id
inference-manager-engine:
model:
default:
runtimeName: vllm
preloaded: true
resources:
limits:
nvidia.com/gpu: 1
overrides:
meta-llama/Meta-Llama-3.1-8B-Instruct-q4_0:
contextLength: 16384
google/gemma-2b-it-q4_0:
runtimeName: ollama
resources:
limits:
nvidia.com/gpu: 0
sentence-transformers/all-MiniLM-L6-v2-f16:
runtimeName: ollama
resources:
limits:
nvidia.com/gpu: 0
inference-manager-server:
service:
annotations:
# These annotations are only meaningful for Kong ingress controller to extend the timeout.
konghq.com/connect-timeout: "360000"
konghq.com/read-timeout: "360000"
konghq.com/write-timeout: "360000"
job-manager-dispatcher:
serviceAccount:
create: false
name: "${LLMARINER_SERVICE_ACCOUNT_NAME}"
notebook:
# Used to set the base URL of the API endpoint. This can be same as global.ingress.controllerUrl
# if the URL is reachable from the inside cluster. Otherwise you can change this to the
# to the URL of the ingress controller that is reachable inside the K8s cluster.
llmarinerBaseUrl: "${INGRESS_CONTROLLER_URL}/v1"
model-manager-loader:
baseModels:
- meta-llama/Meta-Llama-3.1-8B-Instruct-q4_0
- google/gemma-2b-it-q4_0
- sentence-transformers/all-MiniLM-L6-v2-f16
# Required when RAG is used.
vector-store-manager-server:
vectorDatabase:
host: "${MILVUS_ADDR}"
llmEngineAddr: ollama-sentence-transformers-all-minilm-l6-v2-f16:11434
EOF
helm upgrade --install \
--namespace llmariner \
--create-namespace \
llmariner oci://public.ecr.aws/cloudnatix/llmariner-charts/llmariner \
-f llmariner-values.yaml
Note
Starting from Helm v3.8.0, the OCI registry is supported by default. If you are using an older version, please upgrade to v3.8.0 or later. For more details, please refer to Helm OCI-based registries.Note
If you are getting a 403 forbidden error, please trydocker logout public.ecr.aws
. Please see AWS document for more details.If you would like to install only the control-plane components or the worker-plane components, please see multi_cluster_deployment
{.interpreted-text role=“doc”}.
Step 7. Verify the installation
You can verify the installation by sending sample chat completion requests.
Note, if you have used LLMariner in other cases before you may need to delete the previous config by running rm -rf ~/.config/llmariner
The default login user name is admin@example.com
and the password is
password
. You can change this by updating the Dex configuration
(link).
echo "This is your endpoint URL: ${INGRESS_CONTROLLER_URL}/v1"
llma auth login
# Type the above endpoint URL.
llma models list
llma chat completions create --model google-gemma-2b-it-q4_0 --role user --completion "what is k8s?"
llma chat completions create --model meta-llama-Meta-Llama-3.1-8B-Instruct-q4_0 --role user --completion "hello"
Optional: Monitor GPU utilization
If you would like to install Prometheus and Grafana to see GPU utilization, run:
# Add Prometheus
cat <<EOF > prom-scrape-configs.yaml
- job_name: nvidia-dcgm
scrape_interval: 5s
static_configs:
- targets: ['nvidia-dcgm-exporter.nvidia.svc:9400']
- job_name: inference-manager-engine-metrics
scrape_interval: 5s
static_configs:
- targets: ['inference-manager-server-http.llmariner.svc:8083']
EOF
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install --wait \
--namespace monitoring \
--create-namespace \
--set-file extraScrapeConfigs=prom-scrape-configs.yaml \
prometheus prometheus-community/prometheus
# Add Grafana with DCGM dashboard
cat <<EOF > grafana-values.yaml
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server
isDefault: true
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: 'default'
type: file
disableDeletion: true
editable: true
options:
path: /var/lib/grafana/dashboards/standard
dashboards:
default:
nvidia-dcgm-exporter:
gnetId: 12239
datasource: Prometheus
EOF
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install --wait \
--namespace monitoring \
--create-namespace \
-f grafana-values.yaml \
grafana grafana/grafana
Optional: Enable TLS
First follow the cert-manager installation document and install cert-manager to your K8s cluster if you don’t have one. Then create a ClusterIssuer
for your domain. Here is an example manifest that uses Let's Encrypt.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: user@mydomain.com
privateKeySecretRef:
name: letsencrypt
solvers:
- http01:
ingress:
ingressClassName: kong
- selector:
dnsZones:
- llm.mydomain.com
dns01:
...
Then you can add the following to values.yaml
of LLMariner to enable TLS.
global:
ingress:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
tls:
hosts:
- api.llm.mydomain.com
secretName: api-tls
The ingresses created from the Helm chart will have the following annotation and spec:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
...
spec:
tls:
- hosts:
- api.llm.mydomain.com
secretName: api-tls
...