Supported Open Models

The following shows the supported models.

Models that are Officially Supported

The following is a list of supported models where we have validated.

ModelQuantizationsSupporting runtimes
TinyLlama/TinyLlama-1.1B-Chat-v1.0NonevLLM
TinyLlama/TinyLlama-1.1B-Chat-v1.0AWQvLLM
deepseek-ai/DeepSeek-Coder-V2-Lite-BaseQ2_K, Q3_K_M, Q3_K_S, Q4_0Ollama
deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ2_K, Q3_K_M, Q3_K_S, Q4_0Ollama
deepseek-ai/deepseek-coder-6.7b-baseNonevLLM, Ollama
deepseek-ai/deepseek-coder-6.7b-baseAWQvLLM
deepseek-ai/deepseek-coder-6.7b-baseQ4_0vLLM, Ollama
fixie-ai/ultravox-v0_3NonevLLM
google/gemma-2b-itNoneOllama
google/gemma-2b-itQ4_0Ollama
intfloat/e5-mistral-7b-instructNonevLLM
meta-llama/Meta-Llama-3.3-70B-InstructAWQ, FP8-DynamicvLLM
meta-llama/Meta-Llama-3.1-70B-InstructAWQvLLM
meta-llama/Meta-Llama-3.1-70B-InstructQ2_K, Q3_K_M, Q3_K_S, Q4_0vLLM, Ollama
meta-llama/Meta-Llama-3.1-8B-InstructNonevLLM
meta-llama/Meta-Llama-3.1-8B-InstructAWQvLLM, Triton
meta-llama/Meta-Llama-3.1-8B-InstructQ4_0vLLM, Ollama
nvidia/Llama-3.1-Nemotron-70B-InstructQ2_K, Q3_K_M, Q3_K_S, Q4_0vLLM
nvidia/Llama-3.1-Nemotron-70B-InstructFP8-DynamicvLLM
mistralai/Mistral-7B-Instruct-v0.2Q4_0Ollama
sentence-transformers/all-MiniLM-L6-v2-f16NoneOllama

Please note that some models work only with specific inference runtimes.

Using Other Models in HuggingFace

First, create a k8s secret that contains the HuggingFace API key.

kubectl create secret generic \
  huggingface-key \
  -n llmariner \
  --from-literal=apiKey=${HUGGING_FACE_HUB_TOKEN}

The above command assumes that LLMarine runs in the llmariner namespace.

Then deploy LLMariner with the following values.yaml.

model-manager-loader:
  downloader:
    kind: huggingFace
    huggingFace:
      cacheDir: /tmp/.cache/huggingface/hub
  huggingFaceSecret:
    name: huggingface-key
    apiKeyKey: apiKey

  baseModels:
  - Qwen/Qwen2-7B
  - TheBloke/TinyLlama-1.1B-Chat-v1.0-AWQ

Then the model should be loaded by model-manager-loader. Once the loading completes, the model name should show up in the output of llma models list.