[Roadmap] DeepSpeed Roadmap Q1 2025 #6946

loadams · 2025-01-13T22:29:03Z

This is a living document! For each item here, we intend to link the PR/issue for discussion.

This is DeepSpeed's first attempt at a public roadmap and will be updated with additional details.

Long sequence work
Torch.compile
Universal checkpointing
I/O acceleration
Accelerator abstraction
Tensor parallel for training
- autotp training(fix dco) #7004
- async tp allreduce #7115

zhaoyang-star · 2025-02-11T11:43:43Z

Will Multi-Token Prediction mentioned in Deepseek V3 be added to the roadmap Q1?

shiyongde · 2025-02-19T01:52:15Z

need FP8 training deepseek-MOE

hijeffwu · 2025-03-12T10:58:33Z

Plug-in support for the different accelerators

loadams · 2025-03-12T15:14:34Z

@hijeffwu - could you clarify more on what you're requesting? Different accelerators are already supported in DeepSpeed.

hijeffwu · 2025-03-13T03:33:05Z

@hijeffwu - could you clarify more on what you're requesting? Different accelerators are already supported in DeepSpeed.

My idea is as follows:

The current process for adding support for a new accelerator card involves creating a new xxx_accelerator.py file in the accelerators directory and adding a product-specific directory under DeepSpeed/op_builder to adapt kernels for different chips. However, this architecture lacks a unified backend for the different chip kernel code.

Since the primary difference in AI chip vendors' support for DeepSpeed lies in kernel implementations, would it be possible to use "deepspeed-kernels" as the unified kernel code backend for DeepSpeed, while retaining only Python code in the main DeepSpeed repository? This approach could be like Megatron-LM + Apex + TransformerEngine, thereby making DeepSpeed more adaptable to diverse AI chip backends.

Key points in this proposal:

Vendor Flexibility: Chip manufacturers could contribute optimized kernels to deepspeed-kernels without modifying core DeepSpeed code.
Maintainability: Simplifies codebase management by isolating low-level optimizations.
Cross-Platform Compatibility: Similar to how TransformerEngine abstracts NVIDIA-specific optimizations.

This architecture aligns with observed practices in adapting DeepSpeed to non-NVIDIA hardware .

loadams added the roadmap Roadmap direction for DeepSpeed label Jan 13, 2025

loadams pinned this issue Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] DeepSpeed Roadmap Q1 2025 #6946

[Roadmap] DeepSpeed Roadmap Q1 2025 #6946

loadams commented Jan 13, 2025 •

edited by hwchen2017

Loading

zhaoyang-star commented Feb 11, 2025

shiyongde commented Feb 19, 2025

hijeffwu commented Mar 12, 2025

loadams commented Mar 12, 2025

hijeffwu commented Mar 13, 2025

[Roadmap] DeepSpeed Roadmap Q1 2025 #6946

[Roadmap] DeepSpeed Roadmap Q1 2025 #6946

Comments

loadams commented Jan 13, 2025 • edited by hwchen2017 Loading

zhaoyang-star commented Feb 11, 2025

shiyongde commented Feb 19, 2025

hijeffwu commented Mar 12, 2025

loadams commented Mar 12, 2025

hijeffwu commented Mar 13, 2025

loadams commented Jan 13, 2025 •

edited by hwchen2017

Loading