Skip to content

Conversation

@odedovadia
Copy link

@odedovadia odedovadia commented Jan 29, 2026

What does this PR do ?

Add save_optimizer configuration flag to control whether optimizer state is saved with checkpoints, greatly reducing checkpoint size when optimizer state is not needed for resumption.

  • Adds save_optimizer field to CheckpointingConfig (default: True for backward compatibility).
  • Introduces CheckpointManager.get_resume_paths() helper to centralize checkpoint path resolution logic.
  • Updates all algorithms (SFT, DPO, GRPO, RM, Distillation) to use the new helper and respect the flag.
  • When resuming from a checkpoint without optimizer state, a warning is logged and the optimizer is freshly initialized.

Issues

Closes #1828

Usage

# In your config file
checkpointing:
  enabled: true
  save_optimizer: false  # Skip saving optimizer state (default: true)

Before your PR is "Ready for review"

Pre checks:

  • [] Make sure you read and followed Contributor guidelines
  • [] Did you write any new necessary tests?
  • [] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configuration option to control optimizer state saving in checkpoints.
  • Refactor

    • Centralized checkpoint resume path resolution for improved consistency across training algorithms (distillation, DPO, GRPO, RM, and SFT).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Introduces a new save_optimizer configuration flag to selectively save optimizer states during checkpointing. Refactors five algorithm modules to use a centralized get_resume_paths() method for path resolution and conditionally include optimizer paths based on the flag's value.

Changes

Cohort / File(s) Summary
Core Checkpoint Infrastructure
nemo_rl/utils/checkpoint.py
Added save_optimizer optional flag to CheckpointingConfig (defaults to True), stored in CheckpointManager, and introduced get_resume_paths() method to centralize derivation of weights and optimizer paths from checkpoints, with fallback handling when optimizer state is missing.
Algorithm Refactoring
nemo_rl/algorithms/distillation.py, dpo.py, grpo.py, rm.py, sft.py
Replaced explicit checkpoint path construction with calls to checkpointer.get_resume_paths() during setup. Updated checkpoint saving logic to conditionally pass optimizer_path only when checkpointer.save_optimizer is True; otherwise passes None, making optimizer state saving conditional across all five algorithm modules.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR introduces major feature (save_optimizer flag) but PR description states tests are pending with no test results provided. Add test results to PR description demonstrating the new save_optimizer flag works correctly and doesn't introduce regressions across all affected algorithms.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR fully implements the requested feature from issue #1828: adds a configurable save_optimizer flag to CheckpointingConfig, updates algorithms to respect it, and provides backward-compatible behavior with warning on missing optimizer state.
Out of Scope Changes check ✅ Passed All changes across checkpoint.py and algorithm files are directly related to implementing the save_optimizer flag feature and centralizing checkpoint path resolution logic as described in issue #1828.
Title check ✅ Passed The title accurately and clearly describes the main change: adding a save_optimizer flag to control whether optimizer state is saved during checkpointing, which is the primary feature across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_rl/utils/checkpoint.py (1)

52-63: Add save_optimizer to exemplar YAML files and move default from code to YAML.

The save_optimizer key is not reflected in any exemplar YAML files under examples/configs/, violating the guideline that config defaults live in YAML. Remove the inline default comment # Default: True from line 62 and add save_optimizer: true to the checkpointing sections in all exemplar YAMLs (e.g., examples/configs/dpo.yaml, examples/configs/grpo_math_1B.yaml). Additionally, update the docstring to explicitly document the recommended default value and that the field accepts boolean values.

🤖 Fix all issues with AI agents
In `@nemo_rl/utils/checkpoint.py`:
- Line 110: Replace the in-code default for save_optimizer with a strict read
from the config so the YAML remains the source of truth: change the assignment
to read the key without a non-None default (e.g., use config["save_optimizer"]
or config.get("save_optimizer") and validate presence) in the
constructor/initializer where self.save_optimizer is set (the assignment using
self.save_optimizer = config.get("save_optimizer", True)); add a clear error or
validation message if the key is missing so callers know to set it in YAML.
🧹 Nitpick comments (1)
nemo_rl/utils/checkpoint.py (1)

125-132: Add stacklevel to the warning for better caller attribution.

This warning points at the helper instead of the caller.

🔧 Suggested tweak
-            warnings.warn(
+            warnings.warn(
                 f"Optimizer state not found at {optimizer_path}. "
                 "Optimizer will be freshly initialized."
-            )
+            , stacklevel=2)

Signed-off-by: Oded Ovadia <odedov@dreamgroup.com>
@guyueh1 guyueh1 changed the title Added save_optimizer flag feat: Added save_optimizer flag to control saving optimizer or not in checkpointing Jan 29, 2026
@guyueh1 guyueh1 self-requested a review January 29, 2026 16:40
@guyueh1 guyueh1 added the CI:L0 Run doctests and unit tests label Jan 29, 2026
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Jan 31, 2026
Signed-off-by: Oded Ovadia <odedov@dreamgroup.com>
@guyueh1
Copy link
Contributor

guyueh1 commented Feb 4, 2026

@odedov-dream is there any test in unit or functional test that we can add to guard the functionality of the save_optimizer=false path?

@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Feb 4, 2026
Signed-off-by: Oded Ovadia <odedov@dreamgroup.com>
@odedovadia odedovadia requested a review from a team as a code owner February 4, 2026 08:37
@odedovadia
Copy link
Author

@odedov-dream is there any test in unit or functional test that we can add to guard the functionality of the save_optimizer=false path?

Sure, added 2 unit tests.

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests community-request needs-follow-up Issue needs follow-up

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Save checkpoint without optimizer states

4 participants