-
Notifications
You must be signed in to change notification settings - Fork 248
chore: bump mcore and mbridge #1902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
❌ Submodule Fast-Forward Check FailedCheck based on commit: 5e64e0c (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
5e64e0c to
3ad8f21
Compare
📝 WalkthroughWalkthroughUpdates Megatron-LM submodule source and branch, refreshes submodule pointers for Megatron-Bridge and Megatron-LM workspaces, adjusts dependency versions across setup.py files, and integrates ProcessGroupCollection for managing model-parallel process groups in Megatron model initialization and policy worker training operations. Changes
Sequence DiagramsequenceDiagram
participant Setup as Model Setup Flow
participant PGC as ProcessGroupCollection
participant MegatronModel as Megatron Model
participant PolicyWorker as Policy Worker
Setup->>PGC: ProcessGroupCollection.use_mpu_process_groups()
PGC-->>Setup: pg_collection instance
Setup->>Setup: Store in megatron_cfg.model._pg_collection
Setup->>MegatronModel: get_model(..., pg_collection)
MegatronModel-->>Setup: model instance
PolicyWorker->>PolicyWorker: get_pg_collection(model)
PolicyWorker->>PolicyWorker: Use mp_group=pg_collection.mp for distributed ops
PolicyWorker->>MegatronModel: logical_and/reduce_max with mp_group
MegatronModel-->>PolicyWorker: aggregated results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
❌ Submodule Fast-Forward Check FailedCheck based on commit: 3ad8f21 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @.gitmodules:
- Around line 3-4: The .gitmodules entry currently points the Megatron-LM
submodule URL "https://github.com/yaoyu-33/Megatron-LM.git" (branch = main) to a
personal fork; change that URL to the official org repository (e.g.,
"https://github.com/NVIDIA/Megatron-LM.git" or
"https://github.com/NVIDIA-NeMo/Megatron-LM.git") and keep or update the branch
as appropriate, then run submodule sync and update locally to confirm it tracks
the official repo; ensure the updated URL replaces the existing "url" value for
the Megatron-LM submodule in .gitmodules and commit the change.
In `@3rdparty/Megatron-Bridge-workspace/setup.py`:
- Line 29: The dependency constraint "transformers<5.0.0" removes the minimum
version guarantee; update the requirement in setup.py by replacing the existing
"transformers<5.0.0" entry with a bounded range such as
"transformers>=4.57.1,<5.0.0" so pip will enforce a safe minimum version while
still blocking major v5 releases.
In `@nemo_rl/models/megatron/setup.py`:
- Around line 735-736: The code uses setattr(megatron_cfg.model,
"_pg_collection", pg_collection) which is unnecessary reflection; directly
assign the attribute instead by setting megatron_cfg.model._pg_collection =
pg_collection so replace the setattr call with a direct attribute assignment
referencing pg_collection and megatron_cfg.model and the "_pg_collection"
attribute.
In `@nemo_rl/models/policy/workers/megatron_policy_worker.py`:
- Line 419: Replace the undefined call to get_pg_collection in
megatron_policy_worker.py (which will crash train()) by reading the stored
collection off the model config: retrieve pg_collection from
self.model.config._pg_collection (or defensively via getattr(self.model.config,
"_pg_collection")) and use that variable instead of get_pg_collection; also
update the file copyright header year from 2025 to 2026.
🧹 Nitpick comments (2)
3rdparty/Megatron-Bridge-workspace/setup.py (1)
31-31:acceleratedependency has no version constraint.The newly added
acceleratedependency is unpinned. Consider adding at least a minimum version to ensure compatibility.nemo_rl/models/megatron/setup.py (1)
879-879: Consider reusing thepg_collectionalready stored onmegatron_cfg.model.
setup_reference_model_statecreates a newProcessGroupCollection.use_mpu_process_groups()instance, butmegatron_cfg.model._pg_collectionwas already set insetup_model_and_optimizer. Since this function receives the samemegatron_cfg, you could reusemegatron_cfg.model._pg_collectioninstead of creating a redundant instance.Similarly, line 933 in
finalize_megatron_setupcreates yet another instance.
Signed-off-by: Yi-Fu Wu <[email protected]>
3ad8f21 to
7d37192
Compare
❌ Submodule Fast-Forward Check FailedCheck based on commit: 7d37192 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: Yi-Fu Wu <[email protected]>
Signed-off-by: Yi-Fu Wu <[email protected]>
Signed-off-by: Yi-Fu Wu <[email protected]>
❌ Submodule Fast-Forward Check FailedCheck based on commit: 4891923 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: Yi-Fu Wu <[email protected]>
❌ Submodule Fast-Forward Check FailedCheck based on commit: af1d4b6 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: Yi-Fu Wu <[email protected]>
Signed-off-by: Yi-Fu Wu <[email protected]>
❌ Submodule Fast-Forward Check FailedCheck based on commit: 7ad9e52 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
❌ Submodule Fast-Forward Check FailedCheck based on commit: e0f58f9 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: Yi-Fu Wu <[email protected]>
Signed-off-by: Yi-Fu Wu <[email protected]>
e0f58f9 to
9dceaf4
Compare
❌ Submodule Fast-Forward Check FailedCheck based on commit: 9dceaf4 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: Yi-Fu Wu <[email protected]>
❌ Submodule Fast-Forward Check FailedCheck based on commit: 03088ab (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
❌ Submodule Fast-Forward Check FailedCheck based on commit: 1c9a1f4 (PR #1902 from ✅ Submodules that are properly updated:Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) ❌ Submodules that need attention:Megatron-LM: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
What does this PR do ?
Sqashed #1787. Bumps mcore to a branch close to
main.Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit