Roadmap
Future plans
Milestone 0 (Completed)
- OpenAI compatible API
- Models:
google-gemma-2b-it
Milestone 1 (Completed)
- API authorization with Dex
- API key management
- Quota management for fine-tuning jobs
- Inference autoscaling with GPU utilization
- Models:
Mistral-7B-Instruct
,Meta-Llama-3-8B-Instruct
, andgoogle-gemma-7b-it
Milestone 2 (Completed)
- Jupyter Notebook workspace creation
- Dynamic model loading & offloading in inference (initial version)
- Organization & project management
- MLflow integration
- Weights & Biases integration for fine-tuning jobs
- VectorDB installation and RAG
- Multi k8s cluster deployment (initial version)
Milestone 3 (Completed)
- Object store other than MinIO
- Multi-GPU general-purpose training jobs
- Inference optimization (e.g., vLLM)
- Models:
Meta-Llama-3-8B-Instruct
,Meta-Llama-3-70B-Instruct
,deepseek-coder-6.7b-base
Milestone 4 (Completed)
- Embedding API
- API usage visibility
- Fine-tuning support with vLLM
- API key encryption
- Nvidia Triton Inference Server (experimental)
- Release flow
Milestone 5 (In-progress)
- Frontend
- GPU showback
- Non-Nvidia GPU support
- Multi k8s cluster deployment (file and vector store management)
- High availability
- Monitoring & alerting
- More models
Milestone 6
- Multi-GPU LLM fine-tuning jobs
- Events and metrics for fine-tuning jobs