Supported Open Models

The following shows the supported models.

Please note that some models work only with specific inference runtimes.

ModelQuantizationsSupporting runtimes
TinyLlama/TinyLlama-1.1B-Chat-v1.0NonevLLM
TinyLlama/TinyLlama-1.1B-Chat-v1.0AWQvLLM
deepseek-ai/DeepSeek-Coder-V2-Lite-BaseQ2_K, Q3_K_M, Q3_K_S, Q4_0Ollama
deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ2_K, Q3_K_M, Q3_K_S, Q4_0Ollama
deepseek-ai/deepseek-coder-6.7b-baseNonevLLM, Ollama
deepseek-ai/deepseek-coder-6.7b-baseAWQvLLM
deepseek-ai/deepseek-coder-6.7b-baseQ4_0vLLM, Ollama
google/gemma-2b-itNoneOllama
google/gemma-2b-itQ4_0Ollama
intfloat/e5-mistral-7b-instructNonevLLM
meta-llama/Meta-Llama-3.1-70B-InstructAWQvLLM
meta-llama/Meta-Llama-3.1-70B-InstructQ2_K, Q3_K_M, Q3_K_S, Q4_0vLLM, Ollama
meta-llama/Meta-Llama-3.1-8B-InstructNonevLLM
meta-llama/Meta-Llama-3.1-8B-InstructAWQvLLM, Triton
meta-llama/Meta-Llama-3.1-8B-InstructQ4_0vLLM, Ollama
nvidia/Llama-3.1-Nemotron-70B-InstructQ2_K, Q3_K_M, Q3_K_S, Q4_0vLLM
nvidia/Llama-3.1-Nemotron-70B-InstructFP8-DynamicvLLM
mistralai/Mistral-7B-Instruct-v0.2Q4_0Ollama
sentence-transformers/all-MiniLM-L6-v2-f16NoneOllama