Skip to content

Conversation

@tanzelin430
Copy link
Contributor

DeepSpeed SFT Support

The SFT pipeline was originally designed for Megatron strategy, where the model receives labels and returns per-token losses directly. In DeepSpeed strategy with HuggingFace models, the model returns logits instead.

Solution: Override op_compute_language_loss in DeepSpeedTrainStrategy:

  • Compute cross-entropy directly from logits
  • DataCollatorForSFT already shifts labels (shift_feature=True by default), so logits and labels are already aligned - no double-shift needed

This follows ROLL's design pattern where Strategy handles backend differences, keeping Worker code generic.

Quality-of-Life Features

  1. Checkpoint Cleanup (max_ckpt_to_keep)

    • Automatically delete old checkpoints to prevent disk exhaustion
    • Usage: max_ckpt_to_keep: 3 keeps only latest 3 checkpoints
  2. Wandb Offline Mode

    • Added mode parameter to WandbTracker
    • Usage: tracker_kwargs: {mode: offline}
  3. SFT Training Improvements

    • Enable data shuffling in DataLoader (was False)
    • Add tqdm progress bar for training visualization
  4. pip Installation Support

    • Added setup.py for pip install -e .

## DeepSpeed SFT Support

The SFT pipeline was originally designed for Megatron strategy, where the model
receives labels and returns per-token losses directly. In DeepSpeed strategy
with HuggingFace models, the model returns logits instead.

**Solution:** Override `op_compute_language_loss` in `DeepSpeedTrainStrategy`:
- Compute cross-entropy directly from logits
- DataCollatorForSFT already shifts labels (shift_feature=True by default),
  so logits and labels are already aligned - no double-shift needed

This follows ROLL's design pattern where Strategy handles backend differences,
keeping Worker code generic.

## Quality-of-Life Features

1. **Checkpoint Cleanup (`max_ckpt_to_keep`)**
   - Automatically delete old checkpoints to prevent disk exhaustion
   - Usage: `max_ckpt_to_keep: 3` keeps only latest 3 checkpoints

2. **Wandb Offline Mode**
   - Added `mode` parameter to WandbTracker
   - Usage: `tracker_kwargs: {mode: offline}`

3. **SFT Training Improvements**
   - Enable data shuffling in DataLoader (was False)
   - Add tqdm progress bar for training visualization

4. **pip Installation Support**
   - Added setup.py for `pip install -e .`

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@CLAassistant
Copy link

CLAassistant commented Jan 12, 2026

CLA assistant check
All committers have signed the CLA.

@tanzelin430
Copy link
Contributor Author

@PanAndy Please take a look

@PanAndy PanAndy merged commit 4ca292c into alibaba:main Jan 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants