Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
8d9a5f3
Update Megatron submodule pins
yaoyu-33 Jan 16, 2026
58299de
Bump Megatron submodules
yaoyu-33 Jan 16, 2026
2b6eb30
fix CACHED_DEPENDENCIES
ashors1 Jan 16, 2026
07af834
API updates
ashors1 Jan 17, 2026
5f929b3
ci: Add nightly and release tests for gb200 (#1788)
chtruong814 Jan 16, 2026
cd1a421
feat: NeMo Gym refresh 20260113 (#1773)
bxyu-nvidia Jan 18, 2026
ede2357
perf: DeepEP interface in megatron backend (#1794)
guyueh1 Jan 20, 2026
7ace077
feat: refactor init of dtensor policy v2 (#1709)
hemildesai Jan 20, 2026
6373a75
build: Update pyasn1 to >= 0.6.2 (#1791)
chtruong814 Jan 20, 2026
64e1610
docs: Adding k8 guide (#1764)
vinhngx Jan 20, 2026
f721080
test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite…
sfawzy-nv Jan 21, 2026
a9075f9
docs: v0.5 performance results update (#1772)
guyueh1 Jan 21, 2026
7ffaabb
docs: model support page (#1799)
terrykong Jan 21, 2026
4be1e52
refactor: split train and val dataset in response dataset (#1649)
yuki-97 Jan 22, 2026
38fab9c
docs: fix pytorch anchor link: PYTORCH_CUDA_ALLOC_CONF->PYTORCH_ALLOC…
terrykong Jan 22, 2026
417d183
fix: log validation data (#1805)
parthchadha Jan 22, 2026
5ce3866
feat: Add SGLang rollout backend and tests (#1674)
RolaoDenthu Jan 22, 2026
2f1dc01
refactor: reuse setup data (#1808)
yuki-97 Jan 23, 2026
e0e5798
feat: refactor megatron init (#1646)
ashors1 Jan 23, 2026
e0de3a8
build: Bump setuptools >= 80.10.1 and wheel >= 0.46.2 (#1822)
chtruong814 Jan 25, 2026
b98dce0
build: Bump setuptools to 80.10.2 (#1830)
chtruong814 Jan 27, 2026
53c5684
chore(3rdparty): bump Megatron-LM (nemo-rl-use-main) and Megatron-Bri…
yaoyu-33 Jan 27, 2026
1fd8c71
fix mbridge api
yaoyu-33 Jan 28, 2026
3e55a86
Update cached dependencies
yfw Jan 28, 2026
e970643
Fix error when pp > 1
yfw Jan 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[submodule "3rdparty/Megatron-LM"]
path = 3rdparty/Megatron-LM-workspace/Megatron-LM
url = https://github.com/terrykong/Megatron-LM.git
branch = yuya/nemo-rl-use-dev
url = https://github.com/yaoyu-33/Megatron-LM.git
branch = main
shallow = true
[submodule "3rdparty/Megatron-Bridge"]
path = 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge
Expand Down
24 changes: 7 additions & 17 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,15 +57,18 @@ repos:
# intend to merge. Without it, you might run experiments with one config, but when merging upstream,
# the config could silently fall back to the base defaults—resulting in different hyperparameters.
#
# For example, weve seen cases where an SFT recipe runs without a custom chat_template. When merged,
# it unexpectedly picks up the default recommended chat_template from upstream, which doesnt match
# For example, we've seen cases where an SFT recipe runs without a custom chat_template. When merged,
# it unexpectedly picks up the default recommended chat_template from upstream, which doesn't match
# the original experiment setup.
#
# If this check is disruptive, you can disable the pre-commit hook locally. However, before a recipe
# is accepted upstream, we expect the config to be minimized.
#
# The minimize-check command infers the base config from each recipe's `defaults` key, so it
# correctly handles inheritance chains (e.g., child → parent → grandparent).
- repo: local
hooks:
- id: configs-minimize-check-llm
- id: configs-minimize-check
name: minimize-check llm recipes
language: system
pass_filenames: false
Expand All @@ -74,17 +77,4 @@ repos:
- -lc
- |
set -euo pipefail
base="examples/configs/dpo.yaml"; for f in examples/configs/recipes/llm/dpo-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
base="examples/configs/grpo_math_1B.yaml"; for f in examples/configs/recipes/llm/grpo-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
base="examples/configs/sft.yaml"; for f in examples/configs/recipes/llm/sft-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
base="examples/configs/distillation_math.yaml"; for f in examples/configs/recipes/llm/distillation-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
- id: configs-minimize-check-vlm
name: minimize-check vlm recipes
language: system
pass_filenames: false
entry: bash
args:
- -lc
- |
set -euo pipefail
base="examples/configs/vlm_grpo_3B.yaml"; for f in examples/configs/recipes/vlm/vlm_grpo-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
for f in examples/configs/recipes/{llm,vlm}/*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$f"; done
2 changes: 1 addition & 1 deletion 3rdparty/Gym-workspace/Gym
Submodule Gym updated 153 files
10 changes: 0 additions & 10 deletions 3rdparty/Gym-workspace/pyproject.toml

This file was deleted.

113 changes: 0 additions & 113 deletions 3rdparty/Gym-workspace/setup.py

This file was deleted.

2 changes: 1 addition & 1 deletion 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge
Submodule Megatron-Bridge updated 384 files
5 changes: 3 additions & 2 deletions 3rdparty/Megatron-Bridge-workspace/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@
bridge_package_name = "megatron.bridge"

CACHED_DEPENDENCIES = [
"transformers>=4.57.1",
"transformers<5.0.0",
"datasets",
"accelerate",
"omegaconf>=2.3.0",
"tensorboard>=2.19.0",
"typing-extensions",
Expand All @@ -40,7 +41,7 @@
"hydra-core>1.3,<=1.3.2",
"megatron-core[dev,mlm]>=0.15.0a0,<0.17.0",
"qwen-vl-utils",
"transformer-engine[pytorch]>=2.9.0a0,<2.10.0",
"transformer-engine[pytorch]>=2.10.0a0,<2.12.0",
"mamba-ssm",
"nvidia-resiliency-ext",
"causal-conv1d",
Expand Down
2 changes: 1 addition & 1 deletion 3rdparty/Megatron-LM-workspace/Megatron-LM
Submodule Megatron-LM updated 966 files
17 changes: 9 additions & 8 deletions 3rdparty/Megatron-LM-workspace/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,30 +44,31 @@
CACHED_DEPENDENCIES = [
# Default dependencies from pyproject.toml
"torch",
"numpy<2.0.0",
"numpy",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Megatron-LM numpy 2.0 compatibility

💡 Result:

Megatron-LM is not fully NumPy 2.x compatible out of the box (at least in some recent commits), because it still calls np.product(...), which was removed in NumPy 2.x—causing errors like AttributeError: module 'numpy' has no attribute 'product' during distributed checkpointing validation (megatron/core/dist_checkpointing/...). [1]

Fix / workaround options:

  • Patch Megatron-LM: replace np.product(...) with np.prod(...) in the affected files (e.g., exchange_utils.py, mapping.py, validation.py). A PR exists proposing exactly this change. [2]
  • Temporary shim (quick local workaround): add something like:
    import numpy as np
    if not hasattr(np, "product"):
        np.product = np.prod
    (This is also suggested in the bug report.) [1]
  • Pin NumPy: use numpy<2 until your Megatron-LM version includes the fix. [1]

References:
[1] NVIDIA/Megatron-LM Issue #1563 (“numpy has no attribute 'product'” with NumPy 2.2.5)
[2] NVIDIA/Megatron-LM PR #1440 (replace numpy.product with numpy.prod for NumPy ≥ 2.0)


🌐 Web query:

numpy 2.0 breaking changes downstream packages compatibility

💡 Result:

Key NumPy 2.0 breaking changes that affect downstream package compatibility:

  • ABI break (biggest downstream impact): Any package with compiled extensions using the NumPy C-API (direct C/C++, Cython, f2py, etc.) that was built against NumPy 1.x will fail to import on NumPy 2.0 with a binary-incompatibility ImportError. Downstream projects generally need to rebuild/release wheels targeting NumPy 2.0. [1]
  • “Build on 2.x, run on 1.x and 2.x” guidance: NumPy explicitly supports building binaries against NumPy 2.x such that they can run on both NumPy 2.0 and NumPy 1.x at runtime; the reverse is not true (build on 1.x → won’t run on 2.0). NumPy recommends downstreams rebuild once the 2.0 ABI is stable (from 2.0.0rc1). [1][2]
  • C-API evolution / removals: NumPy 2.0 removes/changes older C-API pieces and hides more internals to enable future evolution; this is one reason extensions must be updated/recompiled. [1][3]
  • Python API cleanup (expired/removed/moved symbols): NumPy 2.0 includes a clearer public/private API split and removes a number of non-recommended aliases/functions, so some pure-Python downstream code may break and needs updates. The goal is that once updated for 2.0, code should typically also work on 1.2x without version-branching. [1][4]
  • Behavior change: dtype promotion rules (NEP 50): Mixed-dtype operations may yield different result dtypes/precision than in NumPy 1.x, which can break downstream tests or numerical assumptions even if code still runs. [1][5]
  • Migration tooling: The NumPy project provides a NumPy-2.0 ruff rule (NPY201) to automatically fix many common Python-level changes in downstream code. [5]

Practical takeaway: if you maintain a library with compiled NumPy usage, you almost certainly need a new wheel build/release for NumPy 2.x; if you’re pure Python, you mainly need to address removed/moved APIs and promotion-related behavior/test changes. [1][2][5]

Sources: NumPy 2.0 release notes [1], downstream author guidance [2], NEP 53 (C-API evolution) [3], NEP 52 (Python API cleanup goals/compat) [4], NumPy 2.0 migration guide [5].


🌐 Web query:

transformer-engine nvidia-modelopt numpy 2.0 support

💡 Result:

  • Transformer Engine: The current PyPI release is transformer-engine 2.11.0 (released 2026-01-02). Its published Python package metadata doesn’t list NumPy as a runtime requirement, so Transformer Engine itself is generally not the thing that “needs NumPy 2.0 support” (NumPy compatibility issues usually come from other compiled extensions in the same environment). [1]
  • NVIDIA ModelOpt: The current PyPI release is nvidia-modelopt 0.41.0 (released 2026-01-20) (Python >=3.10,<3.13). NVIDIA doesn’t clearly document a NumPy-2 pin or guarantee on the PyPI page itself, so the safest assumption is: use the latest ModelOpt if you want the best chance of NumPy 2.x working. [2]

If you hit the classic error/warning:

“A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 …”

that indicates some compiled dependency in your env was built against NumPy 1.x; typical workarounds are (a) upgrade the affected module to a wheel built for NumPy 2, or (b) downgrade NumPy to <2 in that environment. [3]

Sources: [1] [2] [3]


Revert numpy constraint or fix Megatron-LM numpy 2.x incompatibility.

Megatron-LM is not fully numpy 2.x compatible. The distributed checkpointing code uses np.product(), which was removed in numpy 2.0 and will cause AttributeError at runtime (see NVIDIA/Megatron-LM Issue #1563). Either pin numpy<2.0.0 to restore the original constraint, or apply the fix from PR #1440 that replaces np.product() with np.prod() throughout the codebase. Relaxing this constraint without addressing the incompatibility will introduce runtime failures.

🤖 Prompt for AI Agents
In `@3rdparty/Megatron-LM-workspace/setup.py` at line 47, The setup currently
lists "numpy" with no version constraint which will allow numpy 2.x and cause
runtime AttributeError due to use of np.product in Megatron-LM; either pin numpy
to a 1.x release by changing the dependency in setup.py to "numpy<2.0.0" or
instead apply the Megatron-LM fix that replaces all uses of np.product() with
np.prod() (search for np.product in the codebase and update to np.prod) so the
code is compatible with numpy 2.x; update setup.py or the code references
accordingly to resolve the incompatibility.

"packaging>=24.2",
# Dev dependencies from pyproject.toml
"nvidia-modelopt[torch]>=0.33.0a0,<0.34.0; sys_platform != 'darwin'",
"transformer-engine[pytorch]>=2.9.0a0,<2.10.0",
"nvidia-resiliency-ext>=0.4.0a0,<0.5.0",
"nvidia-modelopt[torch]; sys_platform != 'darwin'",
"transformer-engine[pytorch,core_cu13]>=2.9.0a0,<2.12.0",
"nvidia-resiliency-ext",
"tqdm",
"einops~=0.8",
"tensorstore~=0.1,!=0.1.46,!=0.1.72",
"nvtx~=0.2",
"multi-storage-client~=0.27",
"opentelemetry-api~=1.33.1",
"setuptools<80.0.0",
"mamba-ssm~=2.2",
"causal-conv1d~=1.5",
"flash-linear-attention~=0.3.2",
"nv-grouped-gemm~=1.1",
"megatron-energon[av_decode]~=6.0",
"av<16.0.0",
"flashinfer-python",
"av",
"flashinfer-python~=0.5.0",
"wget",
"onnxscript",
"flash-linear-attention~=0.3.2",
# VCS dependency - must match pyproject.toml [tool.uv.sources]
"emerging_optimizers @ git+https://github.com/NVIDIA-NeMo/Emerging-Optimizers.git@v0.1.0",
"datasets",
"fastapi~=0.50",
]


Expand Down
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,10 @@ sbatch \
--gres=gpu:8 \
ray.sub
```

> [!NOTE]
> For GB200 systems with 4 GPUs per node, use `--gres=gpu:4` instead.

The required `CONTAINER` can be built by following the instructions in the [Docker documentation](docs/docker.md).

#### GRPO Qwen2.5-32B
Expand Down Expand Up @@ -313,6 +317,9 @@ sbatch \
ray.sub
```

> [!NOTE]
> For GB200 systems with 4 GPUs per node, use `--gres=gpu:4` instead.

#### GRPO Multi-Turn

We also support multi-turn generation and training (tool use, games, etc.).
Expand Down Expand Up @@ -361,6 +368,9 @@ sbatch \
ray.sub
```

> [!NOTE]
> For GB200 systems with 4 GPUs per node, use `--gres=gpu:4` instead.

## Supervised Fine-Tuning (SFT)

We provide example SFT experiments using various datasets including [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/), OpenAI format datasets (with tool calling support), and custom JSONL datasets. For detailed documentation on supported datasets and configurations, see the [SFT documentation](docs/guides/sft.md).
Expand Down Expand Up @@ -406,6 +416,9 @@ sbatch \
ray.sub
```

> [!NOTE]
> For GB200 systems with 4 GPUs per node, use `--gres=gpu:4` instead.

## DPO

We provide a sample DPO experiment that uses the [HelpSteer3 dataset](https://huggingface.co/datasets/nvidia/HelpSteer3) for preference-based training.
Expand Down Expand Up @@ -464,6 +477,9 @@ sbatch \
ray.sub
```

> [!NOTE]
> For GB200 systems with 4 GPUs per node, use `--gres=gpu:4` instead.

## RM

We provide a sample RM experiment that uses the [HelpSteer3 dataset](https://huggingface.co/datasets/nvidia/HelpSteer3) for preference-based training.
Expand Down Expand Up @@ -508,6 +524,9 @@ sbatch \
ray.sub
```

> [!NOTE]
> For GB200 systems with 4 GPUs per node, use `--gres=gpu:4` instead.

## Evaluation

We provide evaluation tools to assess model capabilities.
Expand Down Expand Up @@ -590,7 +609,7 @@ For detailed instructions on how to set up and launch NeMo RL on Slurm or Kubern

- Large amounts of memory fragmentation might occur when running models without support for FlashAttention2.
If OOM occurs after a few iterations of training, it may help to tweak the allocator settings to reduce memory fragmentation.
To do so, specify [`max_split_size_mb`](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
To do so, specify [`max_split_size_mb`](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-alloc-conf)
at **either** one of the following places:
1. Launch training with:
```sh
Expand Down
13 changes: 13 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,18 @@ apt-get clean
rm -rf /var/lib/apt/lists/*
EOF

# CMake (for sglang build)
RUN GITHUB_ARTIFACTORY=github.com \
&& CMAKE_VERSION=3.31.1 \
&& ARCH=$(uname -m) \
&& CMAKE_INSTALLER="cmake-${CMAKE_VERSION}-linux-${ARCH}" \
&& curl --retry 3 --retry-delay 2 -fsSL -o "${CMAKE_INSTALLER}.tar.gz" \
"https://${GITHUB_ARTIFACTORY}/Kitware/CMake/releases/download/v${CMAKE_VERSION}/${CMAKE_INSTALLER}.tar.gz" \
&& tar -xzf "${CMAKE_INSTALLER}.tar.gz" \
&& cp -r "${CMAKE_INSTALLER}/bin/"* /usr/local/bin/ \
&& cp -r "${CMAKE_INSTALLER}/share/"* /usr/local/share/ \
&& rm -rf "${CMAKE_INSTALLER}" "${CMAKE_INSTALLER}.tar.gz"

# Install uv and python
ARG UV_VERSION=0.9.7
ARG PYTHON_VERSION=3.12
Expand Down Expand Up @@ -102,6 +114,7 @@ fi
# The venv is symlinked to avoid bloating the layer size
uv sync --link-mode symlink --locked --no-install-project
uv sync --link-mode symlink --locked --extra vllm --no-install-project
uv sync --link-mode symlink --locked --extra sglang --no-install-project
uv sync --link-mode symlink --locked --extra mcore --no-install-project
uv sync --link-mode symlink --locked --extra automodel --no-install-project
uv sync --link-mode symlink --locked --all-groups --no-install-project
Expand Down
Loading