feat: Add DeepSpeed SFT support and quality-of-life improvements #318

tanzelin430 · 2026-01-12T13:48:16Z

DeepSpeed SFT Support

The SFT pipeline was originally designed for Megatron strategy, where the model receives labels and returns per-token losses directly. In DeepSpeed strategy with HuggingFace models, the model returns logits instead.

Solution: Override op_compute_language_loss in DeepSpeedTrainStrategy:

Compute cross-entropy directly from logits
DataCollatorForSFT already shifts labels (shift_feature=True by default), so logits and labels are already aligned - no double-shift needed

This follows ROLL's design pattern where Strategy handles backend differences, keeping Worker code generic.

Quality-of-Life Features

Checkpoint Cleanup (max_ckpt_to_keep)
- Automatically delete old checkpoints to prevent disk exhaustion
- Usage: max_ckpt_to_keep: 3 keeps only latest 3 checkpoints
Wandb Offline Mode
- Added mode parameter to WandbTracker
- Usage: tracker_kwargs: {mode: offline}
SFT Training Improvements
- Enable data shuffling in DataLoader (was False)
- Add tqdm progress bar for training visualization
pip Installation Support
- Added setup.py for pip install -e .

## DeepSpeed SFT Support The SFT pipeline was originally designed for Megatron strategy, where the model receives labels and returns per-token losses directly. In DeepSpeed strategy with HuggingFace models, the model returns logits instead. **Solution:** Override `op_compute_language_loss` in `DeepSpeedTrainStrategy`: - Compute cross-entropy directly from logits - DataCollatorForSFT already shifts labels (shift_feature=True by default), so logits and labels are already aligned - no double-shift needed This follows ROLL's design pattern where Strategy handles backend differences, keeping Worker code generic. ## Quality-of-Life Features 1. **Checkpoint Cleanup (`max_ckpt_to_keep`)** - Automatically delete old checkpoints to prevent disk exhaustion - Usage: `max_ckpt_to_keep: 3` keeps only latest 3 checkpoints 2. **Wandb Offline Mode** - Added `mode` parameter to WandbTracker - Usage: `tracker_kwargs: {mode: offline}` 3. **SFT Training Improvements** - Enable data shuffling in DataLoader (was False) - Add tqdm progress bar for training visualization 4. **pip Installation Support** - Added setup.py for `pip install -e .` Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

CLAassistant · 2026-01-12T13:48:23Z

All committers have signed the CLA.

tanzelin430 · 2026-01-12T13:52:33Z

@PanAndy Please take a look

PanAndy merged commit 4ca292c into alibaba:main Jan 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add DeepSpeed SFT support and quality-of-life improvements #318

feat: Add DeepSpeed SFT support and quality-of-life improvements #318

Uh oh!

tanzelin430 commented Jan 12, 2026

Uh oh!

CLAassistant commented Jan 12, 2026 •

edited

Loading

Uh oh!

tanzelin430 commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add DeepSpeed SFT support and quality-of-life improvements #318

feat: Add DeepSpeed SFT support and quality-of-life improvements #318

Uh oh!

Conversation

tanzelin430 commented Jan 12, 2026

DeepSpeed SFT Support

Quality-of-Life Features

Uh oh!

CLAassistant commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanzelin430 commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jan 12, 2026 •

edited

Loading