Skip to content

Revert MFSDP mixed precision config arguments.#2848

Merged
ko3n1g merged 3 commits intoNVIDIA-NeMo:mainfrom
cspades:cye/revert-mfsdp-mp-config
Mar 18, 2026
Merged

Revert MFSDP mixed precision config arguments.#2848
ko3n1g merged 3 commits intoNVIDIA-NeMo:mainfrom
cspades:cye/revert-mfsdp-mp-config

Conversation

@cspades
Copy link
Contributor

@cspades cspades commented Mar 17, 2026

What does this PR do ?

Changelog

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Refactor
    • Removed explicit mixed-precision configuration parameters from distributed model building and initialization functions. The system now relies on default mixed-precision handling, reducing configuration complexity for model setup and training workflows.
  • Tests
    • Removed test coverage for mixed-precision configuration propagation paths to align with API changes.

Signed-off-by: Cory Ye <cye@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This pull request removes mixed-precision configuration plumbing from the distributed model building and initialization pipeline. The mixed_precision_config parameter is eliminated from public APIs including unimodal_build_distributed_models, provide_distributed_model, and get_model. Additionally, Megatron-FSDP-specific dtype fields are removed from MixedPrecisionConfig along with associated initialization and finalization logic. Related test coverage is removed.

Changes

Cohort / File(s) Summary
Model Building and Distributed Wrapping
src/megatron/bridge/models/common/unimodal.py, src/megatron/bridge/models/model_provider.py, src/megatron/bridge/training/setup.py
Removed mixed_precision_config parameter from public functions unimodal_build_distributed_models, provide_distributed_model, get_model, and updated call sites to stop forwarding this argument to DDP wrapping logic.
Mixed Precision Configuration
src/megatron/bridge/training/mixed_precision.py
Removed three Megatron-FSDP-specific public fields (megatron_fsdp_main_params_dtype, megatron_fsdp_main_grads_dtype, megatron_fsdp_grad_comm_dtype) from MixedPrecisionConfig class, along with __post_init__ method and associated __setattr__ and finalize logic that managed dtype mappings and synchronization.
Test Coverage
tests/unit_tests/models/common/test_unimodal.py, tests/unit_tests/training/test_mixed_precision.py
Removed test methods validating mixed-precision configuration propagation through distributed model building and provider APIs, as well as tests for Megatron-FSDP dtype field interactions and grad_reduce_in_fp32 behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR removes 314 lines of tests and modifies public APIs without documenting test results or verification in the PR description. Add test results and verification to the PR description confirming the breaking changes have been validated and no regressions introduced.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the primary change: reverting MFSDP mixed precision config arguments from multiple files and test suites.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/megatron/bridge/models/model_provider.py (1)

34-34: ⚠️ Potential issue | 🟡 Minor

Remove duplicate Callable import.

Callable is already imported at line 18. This duplicate import should be removed.

Suggested fix
-from typing import Callable
-
 import torch
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/model_provider.py` at line 34, Remove the
duplicate import of Callable by deleting the redundant "from typing import
Callable" line in model_provider.py so only the original Callable import
(already present earlier in the file) remains; locate the duplicate import
statement to remove it, leaving the earlier import at line 18 intact.
🧹 Nitpick comments (1)
src/megatron/bridge/models/common/unimodal.py (1)

15-20: Import ordering issue: standard library import after logger initialization.

The typing import on line 20 should be grouped with other standard library imports at the top of the file, before the logger initialization. Per coding guidelines, imports should be organized in order: future imports, standard library, third-party, first-party, local folder imports.

Suggested reordering
-import logging
-
-
-logger = logging.getLogger(__name__)
-
-from typing import Any, Callable
+import logging
+from typing import Any, Callable
+
+
+logger = logging.getLogger(__name__)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/common/unimodal.py` around lines 15 - 20, Move the
standard-library import "from typing import Any, Callable" above the logger
initialization and group it with other standard library imports at the top of
the file so imports follow the required ordering; specifically, ensure the
import occurs before the "logger = logging.getLogger(__name__)" line and keep
the "logger" creation after all standard-library imports are declared.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/megatron/bridge/models/model_provider.py`:
- Line 34: Remove the duplicate import of Callable by deleting the redundant
"from typing import Callable" line in model_provider.py so only the original
Callable import (already present earlier in the file) remains; locate the
duplicate import statement to remove it, leaving the earlier import at line 18
intact.

---

Nitpick comments:
In `@src/megatron/bridge/models/common/unimodal.py`:
- Around line 15-20: Move the standard-library import "from typing import Any,
Callable" above the logger initialization and group it with other standard
library imports at the top of the file so imports follow the required ordering;
specifically, ensure the import occurs before the "logger =
logging.getLogger(__name__)" line and keep the "logger" creation after all
standard-library imports are declared.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 49915189-c5e5-4c3b-a8dc-2cdcc6435653

📥 Commits

Reviewing files that changed from the base of the PR and between f689296 and ab1bb53.

📒 Files selected for processing (6)
  • src/megatron/bridge/models/common/unimodal.py
  • src/megatron/bridge/models/model_provider.py
  • src/megatron/bridge/training/mixed_precision.py
  • src/megatron/bridge/training/setup.py
  • tests/unit_tests/models/common/test_unimodal.py
  • tests/unit_tests/training/test_mixed_precision.py
💤 Files with no reviewable changes (4)
  • src/megatron/bridge/training/mixed_precision.py
  • tests/unit_tests/training/test_mixed_precision.py
  • tests/unit_tests/models/common/test_unimodal.py
  • src/megatron/bridge/training/setup.py

@cspades cspades enabled auto-merge (squash) March 18, 2026 18:56
@ko3n1g ko3n1g disabled auto-merge March 18, 2026 18:57
@ko3n1g ko3n1g merged commit 17fba84 into NVIDIA-NeMo:main Mar 18, 2026
14 checks passed
copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026
Signed-off-by: Cory Ye <cye@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants