Install in a Single EKS Cluster

Install LLMariner in an EKS cluster with the standalone mode.

This page goes through the concrete steps to create an EKS cluster, create necessary resources, and install LLMariner. You can skip some of the steps if you have already made necessary installation/setup.

Step 1. Provision an EKS cluster

Step 1.1. Create a new cluster with Karpenter

Either follow the Karpenter getting started guide and create an EKS cluster with Karpenter, or run the following simplified installation steps.

export CLUSTER_NAME="llmariner-demo" export AWS_DEFAULT_REGION="us-east-1" export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)" export KARPENTER_NAMESPACE="kube-system" export KARPENTER_VERSION="1.0.1" export K8S_VERSION="1.30" export TEMPOUT="$(mktemp)" curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > "${TEMPOUT}" \ && aws cloudformation deploy \ --stack-name "Karpenter-${CLUSTER_NAME}" \ --template-file "${TEMPOUT}" \ --capabilities CAPABILITY_NAMED_IAM \ --parameter-overrides "ClusterName=${CLUSTER_NAME}" eksctl create cluster -f - <<EOF --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: ${CLUSTER_NAME} region: ${AWS_DEFAULT_REGION} version: "${K8S_VERSION}" tags: karpenter.sh/discovery: ${CLUSTER_NAME} iam: withOIDC: true podIdentityAssociations: - namespace: "${KARPENTER_NAMESPACE}" serviceAccountName: karpenter roleName: ${CLUSTER_NAME}-karpenter permissionPolicyARNs: - arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME} iamIdentityMappings: - arn: "arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" username: system:node:{{EC2PrivateDNSName}} groups: - system:bootstrappers - system:nodes managedNodeGroups: - instanceType: m5.large amiFamily: AmazonLinux2 name: ${CLUSTER_NAME}-ng desiredCapacity: 2 minSize: 1 maxSize: 10 addons: - name: eks-pod-identity-agent EOF # Create the service linked role if it does not exist. Ignore an already-exists error. aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true # Logout of helm registry to perform an unauthenticated pull against the public ECR. helm registry logout public.ecr.aws # Deploy Karpenter. helm upgrade --install --wait \ --namespace "${KARPENTER_NAMESPACE}" \ --create-namespace \ karpenter oci://public.ecr.aws/karpenter/karpenter \ --version "${KARPENTER_VERSION}" \ --set "settings.clusterName=${CLUSTER_NAME}" \ --set "settings.interruptionQueue=${CLUSTER_NAME}" \ --set controller.resources.requests.cpu=1 \ --set controller.resources.requests.memory=1Gi \ --set controller.resources.limits.cpu=1 \ --set controller.resources.limits.memory=1Gi

Step 1.2. Provision GPU nodes

Once Karpenter is installed, we need to create an EC2NodeClass and a NodePool so that GPU nodes are provisioned. We configure blockDeviceMappings in the EC2NodeClass definition so that nodes have sufficient local storage to store model files.

export GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)" cat << EOF | envsubst | kubectl apply -f - apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: default spec: template: spec: requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: kubernetes.io/os operator: In values: ["linux"] - key: karpenter.sh/capacity-type operator: In values: ["on-demand"] - key: karpenter.k8s.aws/instance-family operator: In values: ["g5"] nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default expireAfter: 720h disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 1m --- apiVersion: karpenter.k8s.aws/v1 kind: EC2NodeClass metadata: name: default spec: amiFamily: AL2 role: "KarpenterNodeRole-${CLUSTER_NAME}" subnetSelectorTerms: - tags: karpenter.sh/discovery: "${CLUSTER_NAME}" securityGroupSelectorTerms: - tags: karpenter.sh/discovery: "${CLUSTER_NAME}" amiSelectorTerms: - id: "${GPU_AMI_ID}" blockDeviceMappings: - deviceName: /dev/xvda ebs: deleteOnTermination: true encrypted: true volumeSize: 256Gi volumeType: gp3 EOF

Step 1.3. Install Nvidia GPU Operator

Nvidia GPU Operator is required to install the device plugin and make GPU resources visible in the K8s cluster. Run:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia helm repo update helm upgrade --install --wait \ --namespace nvidia \ --create-namespace \ gpu-operator nvidia/gpu-operator \ --set cdi.enabled=true \ --set driver.enabled=false \ --set toolkit.enabled=false

Step 1.4. Install an ingress controller

An ingress controller is required to route HTTP/HTTPS requests to the LLMariner components. Any ingress controller works, and you can skip this step if your EKS cluster already has an ingress controller.

Here is an example that installs Kong and make the ingress controller reachable via AWS loadbalancer:

helm repo add kong https://charts.konghq.com helm repo update helm upgrade --install --wait \ --namespace kong \ --create-namespace \ kong-proxy kong/kong \ --set proxy.annotations.service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout=300 \ --set ingressController.installCRDs=false \ --set fullnameOverride=false

Step 2. Create an RDS instance

We will create an RDS in the same VPC as the EKS cluster so that it can be reachable from the LLMariner components. Here are example commands for creating a DB subnet group:

export DB_SUBNET_GROUP_NAME="llmariner-demo-db-subnet" export EKS_SUBNET_IDS=$(aws eks describe-cluster --name "${CLUSTER_NAME}" | jq '.cluster.resourcesVpcConfig.subnetIds | join(" ")' --raw-output) export EKS_SUBNET_ID0=$(echo ${EKS_SUBNET_IDS} | cut -d' ' -f1) export EKS_SUBNET_ID1=$(echo ${EKS_SUBNET_IDS} | cut -d' ' -f2) aws rds create-db-subnet-group \ --db-subnet-group-name "${DB_SUBNET_GROUP_NAME}" \ --db-subnet-group-description "LLMariner Demo" \ --subnet-ids "${EKS_SUBNET_ID0}" "${EKS_SUBNET_ID1}"

and an RDS instance:

export DB_INSTANCE_ID="llmariner-demo" export POSTGRES_USER="admin_user" export POSTGRES_PASSWORD="secret_password" export EKS_SECURITY_GROUP_ID=$(aws eks describe-cluster --name "${CLUSTER_NAME}" | jq '.cluster.resourcesVpcConfig.clusterSecurityGroupId' --raw-output) aws rds create-db-instance \ --db-instance-identifier "${DB_INSTANCE_ID}" \ --db-instance-class db.t3.small \ --engine postgres \ --allocated-storage 10 \ --storage-encrypted \ --master-username "${POSTGRES_USER}" \ --master-user-password "${POSTGRES_PASSWORD}" \ --vpc-security-group-ids "${EKS_SECURITY_GROUP_ID}" \ --db-subnet-group-name "${DB_SUBNET_GROUP_NAME}"

You can run the following command to check the provisioning status.

aws rds describe-db-instances --db-instance-identifier "${DB_INSTANCE_ID}" | jq '.DBInstances[].DBInstanceStatus'

Once the RDS instance is fully provisioned and its status becomes available, obtain the endpoint information for later use.

export POSTGRES_ADDR=$(aws rds describe-db-instances --db-instance-identifier "${DB_INSTANCE_ID}" | jq '.DBInstances[].Endpoint.Address' --raw-output) export POSTGRES_PORT=$(aws rds describe-db-instances --db-instance-identifier "${DB_INSTANCE_ID}" | jq '.DBInstances[].Endpoint.Port' --raw-output)

You can verify if the DB instance is reachable from the EKS cluster by running the psql command:

kubectl run psql --image jbergknoff/postgresql-client --env="PGPASSWORD=${POSTGRES_PASSWORD}" -- -h "${POSTGRES_ADDR}" -U "${POSTGRES_USER}" -p "${POSTGRES_PORT}" -d template1 -c "select now();" kubectl logs psql kubectl delete pods psql

Step 3. Create an S3 bucket

We will create an S3 bucket where model files are stored. Here is an example

# Please change the bucket name to something else. export S3_BUCKET_NAME="llmariner-demo" export S3_REGION="us-east-1" aws s3api create-bucket --bucket "${S3_BUCKET_NAME}" --region "${S3_REGION}"

If you want to set up Milvus for RAG, please create another S3 bucket for Milvus:

# Please change the bucket name to something else. export MILVUS_S3_BUCKET_NAME="llmariner-demo-milvus" aws s3api create-bucket --bucket "${MILVUS_S3_BUCKET_NAME}" --region "${S3_REGION}"

Pods running in the EKS cluster need to be able to access the S3 bucket. We will create an IAM role for service account for that.

export LLMARINER_NAMESPACE=llmariner export LLMARINER_POLICY="LLMarinerPolicy" export LLMARINER_SERVICE_ACCOUNT_NAME="llmariner" export LLMARINER_ROLE="LLMarinerRole" cat << EOF | envsubst > policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::${S3_BUCKET_NAME}/*", "arn:aws:s3:::${S3_BUCKET_NAME}", "arn:aws:s3:::${MILVUS_S3_BUCKET_NAME}/*", "arn:aws:s3:::${MILVUS_S3_BUCKET_NAME}" ] } ] } EOF aws iam create-policy --policy-name "${LLMARINER_POLICY}" --policy-document file://policy.json eksctl create iamserviceaccount \ --name "${LLMARINER_SERVICE_ACCOUNT_NAME}" \ --namespace "${LLMARINER_NAMESPACE}" \ --cluster "${CLUSTER_NAME}" \ --role-name "${LLMARINER_ROLE}" \ --attach-policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${LLMARINER_POLICY}" --approve

Step 4. Install Milvus

Install Milvus as it is used a backend vector database for RAG.

Milvus creates Persistent Volumes. Follow https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html and install EBS CSI driver.

export EBS_CSI_DRIVER_ROLE="AmazonEKS_EBS_CSI_DriverRole" eksctl create iamserviceaccount \ --name ebs-csi-controller-sa \ --namespace kube-system \ --cluster "${CLUSTER_NAME}" \ --role-name "${EBS_CSI_DRIVER_ROLE}" \ --role-only \ --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \ --approve eksctl create addon \ --cluster "${CLUSTER_NAME}" \ --name aws-ebs-csi-driver \ --version latest \ --service-account-role-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:role/${EBS_CSI_DRIVER_ROLE}" \ --force

Then install the Helm chart. Milvus requires access to the S3 bucket. To use the same service account created above, we deploy Milvus in the same namespace as LLMariner.

cat << EOF | envsubst > milvus-values.yaml cluster: enabled: false etcd: replicaCount: 1 persistence: storageClass: gp2 # Use gp3 if available pulsarv3: enabled: false minio: enabled: false standalone: persistence: persistentVolumeClaim: storageClass: gp2 # Use gp3 if available size: 10Gi serviceAccount: create: false name: "${LLMARINER_SERVICE_ACCOUNT_NAME}" externalS3: enabled: true host: s3.us-east-1.amazonaws.com port: 443 useSSL: true bucketName: "${MILVUS_S3_BUCKET_NAME}" region: us-east-1 useIAM: true cloudProvider: aws iamEndpoint: "" logLevel: info EOF helm repo add zilliztech https://zilliztech.github.io/milvus-helm/ helm repo update helm upgrade --install --wait \ --namespace "${LLMARINER_NAMESPACE}" \ --create-namespace \ milvus zilliztech/milvus \ -f milvus-values.yaml

Please see the Milvus installation document and the Helm chart for other installation options.

Set the environmental variables so that LLMariner can later access the Postgres database.

export MILVUS_ADDR=milvus.llmariner.svc.cluster.local

Step 5. Install LLMariner

Run the following command to set up a values.yaml and install LLMariner with Helm.

# Set the endpoint URL of LLMariner. Please change if you are using a different ingress controller. export INGRESS_CONTROLLER_URL=http://$(kubectl get services -n kong kong-proxy-kong-proxy -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') export POSTGRES_SECRET_NAME="db-secret" cat << EOF | envsubst > llmariner-values.yaml global: # This is an ingress configuration with Kong. Please change if you are using a different ingress controller. ingress: ingressClassName: kong # The URL of the ingress controller. this can be a port-forwarding URL (e.g., http://localhost:8080) if there is # no URL that is reachable from the outside of the EKS cluster. controllerUrl: "${INGRESS_CONTROLLER_URL}" annotations: # To remove the buffering from the streaming output of chat completion. konghq.com/response-buffering: "false" database: host: "${POSTGRES_ADDR}" port: ${POSTGRES_PORT} username: "${POSTGRES_USER}" ssl: mode: require createDatabase: true databaseSecret: name: "${POSTGRES_SECRET_NAME}" key: password objectStore: s3: bucket: "${S3_BUCKET_NAME}" region: "${S3_REGION}" endpointUrl: "" prepare: database: createSecret: true secret: password: "${POSTGRES_PASSWORD}" dex-server: staticPasswords: - email: admin@example.com # bcrypt hash of the string: $(echo password | htpasswd -BinC 10 admin | cut -d: -f2) hash: "\$2a\$10\$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W" username: admin-user userID: admin-id file-manager-server: serviceAccount: create: false name: "${LLMARINER_SERVICE_ACCOUNT_NAME}" inference-manager-engine: serviceAccount: create: false name: "${LLMARINER_SERVICE_ACCOUNT_NAME}" model: default: runtimeName: vllm preloaded: true resources: limits: nvidia.com/gpu: 1 overrides: meta-llama/Meta-Llama-3.1-8B-Instruct-q4_0: contextLength: 16384 google/gemma-2b-it-q4_0: runtimeName: ollama resources: limits: nvidia.com/gpu: 0 sentence-transformers/all-MiniLM-L6-v2-f16: runtimeName: ollama resources: limits: nvidia.com/gpu: 0 inference-manager-server: service: annotations: # These annotations are only meaningful for Kong ingress controller to extend the timeout. konghq.com/connect-timeout: "360000" konghq.com/read-timeout: "360000" konghq.com/write-timeout: "360000" job-manager-dispatcher: serviceAccount: create: false name: "${LLMARINER_SERVICE_ACCOUNT_NAME}" notebook: # Used to set the base URL of the API endpoint. This can be same as global.ingress.controllerUrl # if the URL is reachable from the inside cluster. Otherwise you can change this to the # to the URL of the ingress controller that is reachable inside the K8s cluster. llmarinerBaseUrl: "${INGRESS_CONTROLLER_URL}/v1" model-manager-loader: serviceAccount: create: false name: "${LLMARINER_SERVICE_ACCOUNT_NAME}" baseModels: - meta-llama/Meta-Llama-3.1-8B-Instruct-q4_0 - google/gemma-2b-it-q4_0 - sentence-transformers/all-MiniLM-L6-v2-f16 # Required when RAG is used. vector-store-manager-server: serviceAccount: create: false name: "${LLMARINER_SERVICE_ACCOUNT_NAME}" vectorDatabase: host: "${MILVUS_ADDR}" llmEngineAddr: ollama-sentence-transformers-all-minilm-l6-v2-f16:11434 EOF helm upgrade --install \ --namespace llmariner \ --create-namespace \ llmariner oci://public.ecr.aws/cloudnatix/llmariner-charts/llmariner \ -f llmariner-values.yaml

If you would like to install only the control-plane components or the worker-plane components, please see multi_cluster_deployment{.interpreted-text role=“doc”}.

Step 6. Verify the installation

You can verify the installation by sending sample chat completion requests.

Note, if you have used LLMariner in other cases before you may need to delete the previous config by running rm -rf ~/.config/llmariner

The default login user name is admin@example.com and the password is password. You can change this by updating the Dex configuration (link).

echo "This is your endpoint URL: ${INGRESS_CONTROLLER_URL}/v1" llma auth login # Type the above endpoint URL. llma models list llma chat completions create --model google-gemma-2b-it-q4_0 --role user --completion "what is k8s?" llma chat completions create --model meta-llama-Meta-Llama-3.1-8B-Instruct-q4_0 --role user --completion "hello"

Optional: Monitor GPU utilization

If you would like to install Prometheus and Grafana to see GPU utilization, run:

# Add Prometheus cat <<EOF > prom-scrape-configs.yaml - job_name: nvidia-dcgm scrape_interval: 5s static_configs: - targets: ['nvidia-dcgm-exporter.nvidia.svc:9400'] - job_name: inference-manager-engine-metrics scrape_interval: 5s static_configs: - targets: ['inference-manager-server-http.llmariner.svc:8083'] EOF helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm upgrade --install --wait \ --namespace monitoring \ --create-namespace \ --set-file extraScrapeConfigs=prom-scrape-configs.yaml \ prometheus prometheus-community/prometheus # Add Grafana with DCGM dashboard cat <<EOF > grafana-values.yaml datasources: datasources.yaml: apiVersion: 1 datasources: - name: Prometheus type: prometheus url: http://prometheus-server isDefault: true dashboardProviders: dashboardproviders.yaml: apiVersion: 1 providers: - name: 'default' orgId: 1 folder: 'default' type: file disableDeletion: true editable: true options: path: /var/lib/grafana/dashboards/standard dashboards: default: nvidia-dcgm-exporter: gnetId: 12239 datasource: Prometheus EOF helm repo add grafana https://grafana.github.io/helm-charts helm repo update helm upgrade --install --wait \ --namespace monitoring \ --create-namespace \ -f grafana-values.yaml \ grafana grafana/grafana

Optional: Enable TLS

First follow the cert-manager installation document and install cert-manager to your K8s cluster if you don’t have one. Then create a ClusterIssuer for your domain. Here is an example manifest that uses Let's Encrypt.

apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: user@mydomain.com privateKeySecretRef: name: letsencrypt solvers: - http01: ingress: ingressClassName: kong - selector: dnsZones: - llm.mydomain.com dns01: ...

Then you can add the following to values.yaml of LLMariner to enable TLS.

global: ingress: annotations: cert-manager.io/cluster-issuer: letsencrypt tls: hosts: - api.llm.mydomain.com secretName: api-tls

The ingresses created from the Helm chart will have the following annotation and spec:

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: cert-manager.io/cluster-issuer: letsencrypt ... spec: tls: - hosts: - api.llm.mydomain.com secretName: api-tls ...
Last modified March 10, 2025: doc: fix the value of milvus (#118) (1f9137c)