-
Notifications
You must be signed in to change notification settings - Fork 256
chore: Update Megatron submodule pins #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
8d9a5f3
Update Megatron submodule pins
yaoyu-33 58299de
Bump Megatron submodules
yaoyu-33 2b6eb30
fix CACHED_DEPENDENCIES
ashors1 07af834
API updates
ashors1 5f929b3
ci: Add nightly and release tests for gb200 (#1788)
chtruong814 cd1a421
feat: NeMo Gym refresh 20260113 (#1773)
bxyu-nvidia ede2357
perf: DeepEP interface in megatron backend (#1794)
guyueh1 7ace077
feat: refactor init of dtensor policy v2 (#1709)
hemildesai 6373a75
build: Update pyasn1 to >= 0.6.2 (#1791)
chtruong814 64e1610
docs: Adding k8 guide (#1764)
vinhngx f721080
test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite…
sfawzy-nv a9075f9
docs: v0.5 performance results update (#1772)
guyueh1 7ffaabb
docs: model support page (#1799)
terrykong 4be1e52
refactor: split train and val dataset in response dataset (#1649)
yuki-97 38fab9c
docs: fix pytorch anchor link: PYTORCH_CUDA_ALLOC_CONF->PYTORCH_ALLOC…
terrykong 417d183
fix: log validation data (#1805)
parthchadha 5ce3866
feat: Add SGLang rollout backend and tests (#1674)
RolaoDenthu 2f1dc01
refactor: reuse setup data (#1808)
yuki-97 e0e5798
feat: refactor megatron init (#1646)
ashors1 e0de3a8
build: Bump setuptools >= 80.10.1 and wheel >= 0.46.2 (#1822)
chtruong814 b98dce0
build: Bump setuptools to 80.10.2 (#1830)
chtruong814 53c5684
chore(3rdparty): bump Megatron-LM (nemo-rl-use-main) and Megatron-Bri…
yaoyu-33 1fd8c71
fix mbridge api
yaoyu-33 3e55a86
Update cached dependencies
yfw e970643
Fix error when pp > 1
yfw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Submodule Megatron-Bridge
updated
384 files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule Megatron-LM
updated
966 files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Megatron-LM numpy 2.0 compatibility💡 Result:
Megatron-LM is not fully NumPy 2.x compatible out of the box (at least in some recent commits), because it still calls
np.product(...), which was removed in NumPy 2.x—causing errors likeAttributeError: module 'numpy' has no attribute 'product'during distributed checkpointing validation (megatron/core/dist_checkpointing/...). [1]Fix / workaround options:
np.product(...)withnp.prod(...)in the affected files (e.g.,exchange_utils.py,mapping.py,validation.py). A PR exists proposing exactly this change. [2]numpy<2until your Megatron-LM version includes the fix. [1]References:
[1] NVIDIA/Megatron-LM Issue
#1563(“numpy has no attribute 'product'” with NumPy 2.2.5)[2] NVIDIA/Megatron-LM PR
#1440(replacenumpy.productwithnumpy.prodfor NumPy ≥ 2.0)🌐 Web query:
numpy 2.0 breaking changes downstream packages compatibility💡 Result:
Key NumPy 2.0 breaking changes that affect downstream package compatibility:
ImportError. Downstream projects generally need to rebuild/release wheels targeting NumPy 2.0. [1]2.0.0rc1). [1][2]NPY201) to automatically fix many common Python-level changes in downstream code. [5]Practical takeaway: if you maintain a library with compiled NumPy usage, you almost certainly need a new wheel build/release for NumPy 2.x; if you’re pure Python, you mainly need to address removed/moved APIs and promotion-related behavior/test changes. [1][2][5]
Sources: NumPy 2.0 release notes [1], downstream author guidance [2], NEP 53 (C-API evolution) [3], NEP 52 (Python API cleanup goals/compat) [4], NumPy 2.0 migration guide [5].
🌐 Web query:
transformer-engine nvidia-modelopt numpy 2.0 support💡 Result:
transformer-engine 2.11.0(released 2026-01-02). Its published Python package metadata doesn’t list NumPy as a runtime requirement, so Transformer Engine itself is generally not the thing that “needs NumPy 2.0 support” (NumPy compatibility issues usually come from other compiled extensions in the same environment). [1]nvidia-modelopt 0.41.0(released 2026-01-20) (Python >=3.10,<3.13). NVIDIA doesn’t clearly document a NumPy-2 pin or guarantee on the PyPI page itself, so the safest assumption is: use the latest ModelOpt if you want the best chance of NumPy 2.x working. [2]If you hit the classic error/warning:
that indicates some compiled dependency in your env was built against NumPy 1.x; typical workarounds are (a) upgrade the affected module to a wheel built for NumPy 2, or (b) downgrade NumPy to
<2in that environment. [3]Sources: [1] [2] [3]
Revert numpy constraint or fix Megatron-LM numpy 2.x incompatibility.
Megatron-LM is not fully numpy 2.x compatible. The distributed checkpointing code uses
np.product(), which was removed in numpy 2.0 and will causeAttributeErrorat runtime (see NVIDIA/Megatron-LM Issue#1563). Either pinnumpy<2.0.0to restore the original constraint, or apply the fix from PR#1440that replacesnp.product()withnp.prod()throughout the codebase. Relaxing this constraint without addressing the incompatibility will introduce runtime failures.🤖 Prompt for AI Agents