-
Notifications
You must be signed in to change notification settings - Fork 88
Description
🗺️ [26.04] Automodel Roadmap
This issue tracks the planned work items for Automodel. We'd love to hear from you — if you have feature requests, suggestions, or want to upvote an item, please comment below!
Core Infrastructure
-
CLI Application for Job Execution
Introduce a CLI tool to simplify launching and managing training/inference jobs, reducing boilerplate and configuration overhead. -
Checkpointing Robustness & Speed
Improve checkpoint save/restore reliability and performance. Faster loads/saves, better tracking of key metrics in test suite, reduced memory overhead for state adapters, and reduced storage overhead.
Model Zoo & Registry
-
Day-0 Model Zoo
Continue supporting day-0 model releases with optimizations. -
Model Capability Registry (with Testing & Docs Generation)
Build a registry that maps each model to its supported capabilities (e.g., parallelism strategies, precision modes, sequence lengths) and automatically generates test coverage and documentation from it.
MoE (Mixture of Experts)
-
Lower Precision MoE (FP8 + NVFP4)
Enable MoE training and inference at FP8 and NVFP4 precision for improved throughput and reduced memory footprint. -
Hybrid Expert Parallelism (Hybrid-EP)
Support hybrid expert parallelism strategies that combine EP with TP/PP for more flexible and efficient MoE scaling. -
CUDAGraph + MoE
Enable CUDAGraph capture for MoE models to reduce kernel launch overhead and improve end-to-end throughput. -
PeFT MoE with TransformerEngine Experts
Integrate TransformerEngine as the expert implementation backend for MoE.
Vision-Language Models (VLM)
-
VLM Refactor
Refactor the VLM architecture for cleaner abstractions, better modularity, and easier extension to new vision-language model families. -
Packed Sequence Support for VLM
Add packed/variable-length sequence support for VLM training to improve GPU utilization when inputs have heterogeneous lengths.
💬 We Want Your Input!
Have a feature request or use case that isn't covered above? Please comment on this issue and let us know:
- What you'd like to see
- Why it matters for your workflow
- Any context (model family, scale, hardware, etc.)
We'll prioritize based on community feedback and engineering feasibility.