Roadmap

Future plans

Milestone 0 (Completed)

  • OpenAI compatible API
  • Models: google-gemma-2b-it

Milestone 1 (Completed)

  • API authorization with Dex
  • API key management
  • Quota management for fine-tuning jobs
  • Inference autoscaling with GPU utilization
  • Models: Mistral-7B-Instruct, Meta-Llama-3-8B-Instruct, and google-gemma-7b-it

Milestone 2 (Completed)

  • Jupyter Notebook workspace creation
  • Dynamic model loading & offloading in inference (initial version)
  • Organization & project management
  • MLflow integration
  • Weights & Biases integration for fine-tuning jobs
  • VectorDB installation and RAG
  • Multi k8s cluster deployment (initial version)

Milestone 3 (Completed)

  • Object store other than MinIO
  • Multi-GPU general-purpose training jobs
  • Inference optimization (e.g., vLLM)
  • Models: Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B-Instruct, deepseek-coder-6.7b-base

Milestone 4 (Completed)

  • Embedding API
  • API usage visibility
  • Fine-tuning support with vLLM
  • API key encryption
  • Nvidia Triton Inference Server (experimental)
  • Release flow

Milestone 5 (In-progress)

  • Frontend
  • GPU showback
  • Non-Nvidia GPU support
  • Multi k8s cluster deployment (file and vector store management)
  • High availability
  • Monitoring & alerting
  • More models

Milestone 6

  • Multi-GPU LLM fine-tuning jobs
  • Events and metrics for fine-tuning jobs