Skip to content

feat: MoE and Mamba support for QARL #2442

Open
mxinO wants to merge 46 commits into
mainfrom
mxin/moe-mamba-sft
Open

feat: MoE and Mamba support for QARL #2442
mxinO wants to merge 46 commits into
mainfrom
mxin/moe-mamba-sft

Conversation

@mxinO

@mxinO mxinO commented May 8, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

  • Adding MoE and Mamba support for QARL
  • Updating modelopt to latest
  • Fix potential issues for vllm fake calibration

Note: the current MBridge is in a cherry-pick branch for MoE amax mapping, we need to switch to main once nemo-rl is updated to support latest Mbridge.

Issues

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • the new megatron updated the implementation of conv1d, so currently conv1d quantizer doesn't work, but we also don't quantize it usually, but we will need a fix. And this will cause the fail when exporting the quantized nano3 ckpt. This can be fixed later with a small pr. Draft pr: Support Mamba direct conv1d quant export NVIDIA/Model-Optimizer#1716

@copy-pr-bot

copy-pr-bot Bot commented May 8, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label May 8, 2026
@mxinO mxinO changed the title Mxin/moe mamba sft feat: MoE and Mamba support for QARL May 21, 2026
@mxinO mxinO added the CI:L1 Run doctests, unit tests, and functional tests label May 27, 2026
@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: e7511a5 (PR #2442 from mxin/moe-mamba-sft)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@mxinO mxinO marked this pull request as ready for review May 28, 2026 13:35
@mxinO mxinO requested review from a team as code owners May 28, 2026 13:35
@copy-pr-bot

copy-pr-bot Bot commented May 28, 2026

Copy link
Copy Markdown

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mxinO

mxinO commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: bca0084 (PR #2442 from mxin/moe-mamba-sft)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@mxinO

mxinO commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

mxinO added 26 commits June 11, 2026 04:48
Nano3 QADOPD needs the default NVFP4 W4A4 recipe so student training and rollout generation can use the same quantization surface as the existing on-policy distillation experiments. Restore the tracked recipe from the earlier Nano3 ModelOpt config history and point the Nano3 QADOPD example at it.

Constraint: Keep QARL on the Nano3 weight-only recipe because GRPO examples document W4A16 as the safer GRPO recipe.

Confidence: medium

Scope-risk: narrow

Tested: Loaded Nano3 QARL and QADOPD configs with nemo_rl.utils.config.load_config and OmegaConf resolution

Tested: Loaded nano3_nvfp4_default.yaml with OmegaConf

Tested: git diff --cached --check

Not-tested: End-to-end Nano3 QADOPD training run
Signed-off-by: Meng Xin <mxin@nvidia.com>
The Nano3 QARL example should exercise the ModelOpt full-parameter path without inheriting the LoRA recipe. The config now carries the Nano3 Megatron run shape directly on the generic GRPO math base, and user-authored quant recipes are documented as absolute paths for Ray/container workers. The vLLM dummy-weight patch also sanitizes all nonfinite dummy activations before calibration so W4A4 startup survives partial-NaN dummy tensors without weakening runtime checks.

Constraint: LoRA/PEFT support is out of scope for this MR.

Constraint: Ray workers need absolute user recipe paths when running in containers.

Rejected: Patch Megatron-Bridge checkpoint extra-state load globally | the observed extra-state failure came from the accidental PEFT preload path.

Confidence: high

Scope-risk: narrow

Tested: python -m py_compile nemo_rl/modelopt/models/generation/vllm_quant_patch.py nemo_rl/modelopt/utils.py nemo_rl/modelopt/models/policy/workers/megatron_quant_policy_worker.py

Tested: Slurm smoke job 178359 completed one Nano3 QARL GRPO step with finite train/loss, train/grad_norm, and train/reward.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The Nano3 QA distillation example needs to cover the NeMo Gym distillation entrypoint now that Gym rollout support exists for distillation. The config uses existing Gym verifiers-agent sample data and config so the example exercises the same data shape as Gym without adding modelopt-only fixture files or runtime patches.

Constraint: QADOPD smoke coverage must go through examples/nemo_gym/run_distillation_nemo_gym.py

Constraint: Keep the MR surface minimal and avoid runtime changes that existing NeMo Gym runs do not need

Rejected: Add a modelopt-local JSONL fixture | existing Gym verifiers-agent data already provides agent_ref and responses_create_params

Rejected: Patch skip_tokenizer_init in nemo_gym.py | expose_http_server lets existing generation config set tokenizer initialization correctly

Rejected: Broaden vLLM top_k handling | the smoke passes without changing async vLLM request validation

Confidence: high

Scope-risk: narrow

Tested: Slurm smoke job 179300 completed 0:0; NeMo Gym rollouts, teacher logprobs, and policy training ran; train/loss@1=0.03154120594263077

Not-tested: Full multi-step production QADOPD training
Signed-off-by: Meng Xin <mxin@nvidia.com>
QARL needed a cheap non-Nano3 MoE signal before opening the Nemo-RL PR. This adds a one-step Qwen3-30B-A3B NVFP4 weight-only health check that exercises ModelOpt calibration, quantized vLLM generation, logprobs, and a quantized Megatron policy train step.

Constraint: Keep the example portable by using a built-in ModelOpt quantization recipe instead of a repo-relative recipe path.

Rejected: Reintroduce Nemo-RL vLLM quant patch shim | the validated eager-backend smoke passes without making Nemo-RL own the ModelOpt issue.

Confidence: high

Scope-risk: narrow

Directive: This is a control-flow smoke, not a convergence or token-mult-prob benchmark.

Tested: bash -n tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Tested: TEST_DRYRUN=1 bash tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Tested: Slurm smoke job 185322 completed with train/loss=0.04845, num_valid_samples=16, probs_ratio=1.00006.

Not-tested: Full nightly scheduler invocation.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The ModelOpt examples use qa_* YAML names, so keep the new Qwen3 MoE QARL config consistent with the existing examples/modelopt convention.

Constraint: Preserve the already validated test script and runtime settings while only changing the config path and default output names.

Confidence: high

Scope-risk: narrow

Tested: bash -n tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Tested: TEST_DRYRUN=1 bash tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Not-tested: Re-running the full Slurm smoke after path-only rename.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The Qwen3 MoE nightly should prove the intended quantized control flow, not just that training emitted any positive loss. Use the validated smoke values to check expected quantizer counts, full valid-sample coverage, rollout length, bounded reward/KL, and near-identity probability ratios.

Constraint: Keep token_mult_prob_error out of this one-step smoke because the BF16 Qwen3 control showed it can be dominated by a single long-output outlier.

Confidence: high

Scope-risk: narrow

Tested: bash -n tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Tested: TEST_DRYRUN=1 bash tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Tested: Grep checks against Slurm smoke job 185322 run.log.

Tested: python tests/check_metrics.py against Slurm smoke job 185322 metrics.json with the tightened checks.

Not-tested: Re-running the full Slurm smoke after assertion-only changes.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The validated smoke loss was about 0.048, but a one-step QARL health check should leave room for harmless startup variance while still catching non-finite or clearly broken loss values.

Constraint: Preserve the tightened quantizer, validity, reward, KL, and ratio checks.

Confidence: high

Scope-risk: narrow

Tested: bash -n tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Tested: TEST_DRYRUN=1 bash tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh

Tested: python tests/check_metrics.py against Slurm smoke job 185322 metrics.json for the relaxed loss bounds.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The recipe/test-suite accounting test only scans examples/configs/recipes, while the Qwen3 QARL nightly initially pointed directly at the ModelOpt example YAML. Add the standard recipe wrapper and let the shell test use common.env so the suite path convention stays intact.

Constraint: test_recipes_and_test_suites maps tests/test_suites/llm/<name>.sh to examples/configs/recipes/llm/<name>.yaml

Rejected: Teach the unit test about examples/modelopt | Broader than needed and inconsistent with existing ModelOpt distillation wrapper

Confidence: high

Scope-risk: narrow

Tested: bash -n Qwen3 QARL test script

Tested: TEST_DRYRUN=1 Qwen3 QARL test script

Tested: standalone recipe/test-suite set comparison reports 148 recipes and 148 tests with no differences

Tested: loaded wrapper config and verified Qwen3 QARL quant/eager settings

Not-tested: pytest in this shell, blocked by missing torch and uv Python 3.13.13 interpreter
Signed-off-by: Meng Xin <mxin@nvidia.com>
The vLLM fakequant path needs Python-visible MoE execution so ModelOpt quantizers are exercised during Nano3 and Qwen3 MoE generation. Centralizing the quant worker kwargs keeps sync and async workers on the same backend defaults while preserving explicit user overrides.

Constraint: vLLM FlashInfer MoE bypasses the Python fakequant path used by ModelOpt.

Rejected: Patch ModelOpt's vLLM plugin backend checks | too coupled to vLLM internals and not needed once Nemo-RL selects a compatible backend.

Confidence: medium

Scope-risk: narrow

Directive: Do not remove the Triton default without re-running a quantized MoE vLLM smoke that proves quantizers execute.

Tested: python -m py_compile nemo_rl/modelopt/models/generation/vllm_quant_worker.py tests/unit/models/generation/test_vllm_quant_worker.py

Tested: git diff --check for the touched files

Tested: Slurm Nano3 QAD smoke job 188890 completed with SMOKE_METRICS_OK qa-distillation-nano3

Not-tested: Focused pytest on host because torch is not installed outside the container
Signed-off-by: Meng Xin <mxin@nvidia.com>
The helper only centralizes vLLM quant kwargs and is covered by the Nano3 smoke path, so keeping a unit test for this private wiring adds review overhead without meaningful protection.

Constraint: The PR should stay minimal and avoid tests for trivial private helper wiring.

Confidence: high

Scope-risk: narrow

Tested: python -m py_compile nemo_rl/modelopt/models/generation/vllm_quant_worker.py
Signed-off-by: Meng Xin <mxin@nvidia.com>
The MoE/Mamba quant smoke now relies on the newer ModelOpt package state used by the current container and lockfile. Updating both pyproject.toml and uv.lock keeps the modelopt extra and mcore extra resolved to the same source revision.

Constraint: Nemo-RL dependency resolution must stay lockfile-consistent; pyproject-only updates are not enough.

Rejected: Depend on the local ModelOpt PR branch | Nemo-RL only needs the upstream dependency rev for this path after moving MoE backend selection into Nemo-RL.

Confidence: medium

Scope-risk: moderate

Tested: git diff --check for pyproject.toml and uv.lock

Tested: Slurm Nano3 QAD smoke job 188890 completed with SMOKE_METRICS_OK qa-distillation-nano3

Not-tested: Full uv sync/lock regeneration in this turn
Signed-off-by: Meng Xin <mxin@nvidia.com>
The Nano3 NeMo Gym distillation smoke passed on the unpacked CP=1 path with sequence packing disabled for both policy and teacher. Recording that setup in the example keeps the checked-in config aligned with the path we can currently validate.

Constraint: Nano3 combines MoE and Mamba, so CP/packing changes need targeted smoke coverage before becoming the example default.

Rejected: Leave the example at CP=4 | that was not the path validated by the successful Nano3 QAD smoke in this stack.

Confidence: medium

Scope-risk: narrow

Tested: git diff --check for examples/modelopt/qa_distillation_nano3_megatron.yaml

Tested: local config sanity check for CP=1, sequence_packing false, and max_val_samples null

Tested: Slurm Nano3 QAD smoke job 188890 completed with SMOKE_METRICS_OK qa-distillation-nano3
Signed-off-by: Meng Xin <mxin@nvidia.com>
The QARL guide still described MoE and Mamba models as unsupported, but this branch now includes smoke-tested Megatron QARL configs for Qwen3 MoE and Nano3 hybrid MoE/Mamba. Update the guide to distinguish smoke-tested architecture support from convergence guarantees.

Constraint: Do not overclaim broad convergence for MoE/Mamba QARL

Confidence: high

Scope-risk: narrow

Tested: rg confirmed stale unsupported sentence was removed

Tested: git diff --check

Not-tested: Documentation build
Signed-off-by: Meng Xin <mxin@nvidia.com>
QARL checkpoints should not serialize Nemo-RL helper targets for the transformer layer spec. Megatron-Bridge validates checkpoint _target_ strings against an allowlist on reload, so the worker returns the Megatron ModelOpt GPT spec as a functools.partial with the required sequence-packing kwargs bound.

The export wrapper keeps registering nemo_rl. only for older checkpoints already written with the previous helper target. New checkpoints serialize the Megatron target directly.

Constraint: Megatron-Bridge checkpoint instantiation allowlists megatron.*, nemo.*, torch.*, transformers.*, numpy.*, and nvidia.* targets by default.

Rejected: Register nemo_rl. in the training worker | this keeps new checkpoints dependent on a process-local allowlist side effect.

Confidence: high

Scope-risk: narrow

Directive: Keep transformer_layer_spec serialized as a Megatron target unless Megatron-Bridge's allowlist policy changes.

Tested: python -m py_compile on edited files

Tested: ruff check on edited files

Tested: git diff --check

Tested: Slurm smoke job 202703 completed 0:0 with SMOKE_METRICS_OK and run_config transformer_layer_spec target megatron.core.post_training.modelopt.gpt.model_specs.get_gpt_modelopt_spec
Signed-off-by: Meng Xin <mxin@nvidia.com>
ModelOpt quantized Mamba models need the same import-time layer-spec override path as GPT models, otherwise HF-to-Megatron conversion can fall back to the non-quantized Mamba stack spec while quantization hooks are active.

This forwards an optional Mamba stack spec through the existing model import path and lets the ModelOpt quant worker choose the ModelOpt or TE Mamba stack spec with the existing DISABLE_MODELOPT_LAYER_SPEC switch.

Constraint: ModelOpt QuantSequentialMLP does not support TP and EP together, so the enabled-layer-spec Nano3 smoke used TP=1/EP=8 while the disabled-layer-spec smoke covered TP=4/EP=8.

Rejected: Registering another serialized spec wrapper | Megatron-Bridge already accepts callable stack specs and the GPT path already uses a callback.

Confidence: high

Scope-risk: narrow

Tested: python -m py_compile on touched files

Tested: uv run ruff check on touched files

Tested: Slurm smoke qa-distillation-nano3-no-modelopt-layerspec job 202752 completed with loss 0.03093

Tested: Slurm smoke qa-distillation-nano3-modelopt-tp1ep8 job 202797 completed with loss 0.02562 and grad norm 1.00705
Signed-off-by: Meng Xin <mxin@nvidia.com>
The export wrapper is only a stable NeMo RL example entry point for current QARL checkpoints. It should not register Nemo-RL target prefixes or imply support for older checkpoints that stored nemo_rl.* layer-spec targets.

Constraint: Backward compatibility for older QARL checkpoints is intentionally not offered.

Rejected: Keep register_allowed_target_prefix('nemo_rl.') | that preserves unsupported legacy checkpoint behavior and makes the guide misleading.

Confidence: high

Scope-risk: narrow

Tested: python -m py_compile examples/modelopt/export_quantized_to_hf.py

Tested: uv run ruff check examples/modelopt/export_quantized_to_hf.py
Signed-off-by: Meng Xin <mxin@nvidia.com>
Most QARL model paths do not need ModelOpt's custom Megatron layer specs, and disabling them keeps quantization enabled while using the standard Megatron specs that are typically faster.

This adds the recommended DISABLE_MODELOPT_LAYER_SPEC=1 launch form to the Nano3 and Qwen3 30B QARL examples and sets it in the Qwen3 30B QA-GRPO nightly wrapper so the nightly exercises the preferred path.

Constraint: The env var is read by the ModelOpt worker before model import, so it belongs in launch examples and shell wrappers rather than YAML config fields.

Rejected: Set DISABLE_MODELOPT_LAYER_SPEC inside config files | it is a process environment switch, not a Hydra config option.

Confidence: high

Scope-risk: narrow

Tested: git diff --check

Tested: bash -n tests/test_suites/llm/grpo-qwen3-30ba3b-4n4g-megatron-qa-nvfp4.sh
Signed-off-by: Meng Xin <mxin@nvidia.com>
The setup helper now forwards mamba_stack_spec through the same import path as transformer_layer_spec. Update the mocked import assertions so the unit tests describe that explicit default forwarding instead of the old call shape.

Constraint: Mamba import support adds a new optional keyword to the import boundary
Confidence: high
Scope-risk: narrow
Tested: uv run --group dev pre-commit run --files tests/unit/models/megatron/test_megatron_setup.py
Not-tested: Focused pytest in this host; Ray dashboard/GCS startup and missing CUDA/Transformer Engine dependencies prevent reaching the test body here
Signed-off-by: Meng Xin <mxin@nvidia.com>
Handle the concrete review issues without broadening the PR: validation now applies the existing NeMo-Gym response logging opt-in directly, dummy vLLM calibration preserves ModelOpt's meta-tensor contract, and the ModelOpt layer-spec toggle is recorded in policy config while keeping the old env fallback.

A one-step Nano3 nightly recipe covers the default ModelOpt Mamba stack-spec path separately from the faster standard-spec examples.

Constraint: Existing smoke scripts relied on DISABLE_MODELOPT_LAYER_SPEC, so the env var remains a compatibility fallback while configs move to policy.disable_modelopt_layer_spec.
Rejected: Move the dummy NaN handling into ModelOpt | the workaround is specific to NeMo-RL's dummy-weight vLLM prolog and ModelOpt should still fail on real NaNs.
Confidence: medium
Scope-risk: moderate
Tested: uv run --group dev pre-commit run --files <changed files>
Tested: bash -n Nano3/Qwen3 QARL test-suite scripts
Tested: TEST_DRYRUN=1 Nano3 modelopt-spec nightly script
Tested: direct recipe/test-suite accounting check
Not-tested: Full pytest suite; this host's unit-test Ray fixture has known local dashboard/GCS startup issues
Signed-off-by: Meng Xin <mxin@nvidia.com>
The QARL layer-spec switch should come from the validated policy config, not a hidden process environment fallback. This keeps submitted YAML and job scripts as the root truth and avoids surprising behavior from stale shell state. The worker constructor reads the incoming config directly because self.cfg is initialized by the base class after this setup path.

Constraint: Smoke and example jobs must be reproducible from explicit config overrides, not DISABLE_MODELOPT_LAYER_SPEC in the caller environment.

Rejected: Keep environment fallback | hidden env state can override the submitted config and makes smoke results harder to reason about.

Confidence: high

Scope-risk: narrow

Tested: uv run --group dev pre-commit run --files nemo_rl/modelopt/models/policy/workers/utils.py nemo_rl/modelopt/models/policy/workers/megatron_quant_policy_worker.py tests/unit/models/policy/test_megatron_quant_worker.py

Tested: Qwen3 MoE QARL smoke job 208208 completed with SMOKE_METRICS_OK and disable_modelopt_layer_spec=True.

Tested: Nano3 QARL smoke jobs 208199 and 208200 completed for disabled and enabled ModelOpt layer-spec paths.

Not-tested: Targeted pytest remains skipped in this environment by the existing requires_weight_folding guard.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The documentation should describe disable_modelopt_layer_spec as a recommended first attempt rather than an absolute rule. The fallback path is useful when standard Megatron layer specs hit architecture- or recipe-specific errors.

Constraint: QARL layer-spec compatibility varies by model architecture and quantization recipe.

Rejected: Keep the run-config recording rationale | it does not explain when users should choose either path.

Confidence: high

Scope-risk: narrow

Tested: git diff --check -- docs/guides/quantization-aware-rl.md

Not-tested: Documentation-only change; no runtime tests run.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The Nano3 modelopt-layer-spec nightly was inheriting a Qwen3-shaped smoke setup that could silently skip training when the tiny NeMo Gym fixture was smaller than the prompt batch. This makes the nightly exercise one real train step on 4n4g, uses the tokenizer that carries Nano3's chat template for vLLM, and checks a Nano3-specific loss window from the successful smoke run.

Constraint: NeMo Gym smoke fixture has five prompts and distillation uses drop_last=True.

Constraint: Teacher logprob workers require the generated batch size to shard evenly across 16 workers.

Rejected: Keep the 4n8g nightly shape | it costs more GPUs without improving this one-step layer-spec signal.

Confidence: high

Scope-risk: narrow

Tested: bash -n tests/test_suites/llm/distillation-nano3-30ba3b-4n4g-megatron-qa-nvfp4-modelopt-spec.sh

Tested: git diff --check

Tested: Slurm smoke job 208322 completed; train/loss=0.0432905294 and mean_gen_tokens_per_sample=1162.25.

Not-tested: tests/unit/test_recipes_and_test_suites.py is blocked by a stale Ray GCS fixture in this environment.
Signed-off-by: Meng Xin <mxin@nvidia.com>
The pre-commit recipe minimization hook requires the Nano3 modelopt-spec nightly YAML to contain only values that differ from its base config. This removes the inherited cluster node count and comment-only text so the committed recipe matches the minimized form.

Constraint: configs-minimize-check runs over llm recipes in pre-commit.

Confidence: high

Scope-risk: narrow

Tested: uv run --group dev pre-commit run --from-ref origin/main --to-ref HEAD
Signed-off-by: Meng Xin <mxin@nvidia.com>
The rebase conflict resolution preserved the Nano3 ModelOpt layer-spec nightly entry but left it after the PPO block. Keep the suite list grouped by algorithm so recipe accounting and human review stay straightforward.

Constraint: This is a local rebase cleanup after removing the now-upstream NeMo Gym distillation changes from the MoE/Mamba MR.

Confidence: high

Scope-risk: narrow

Tested: git diff --check origin/main..HEAD

Tested: uv lock
Signed-off-by: Meng Xin <mxin@nvidia.com>
The rebase onto origin/main dropped the Qwen3 MoE QARL suite listing and left behind a stale unit test for the removed env-var layer-spec API. Keep the nightly accounting complete while relying on the current worker-level layer-spec coverage instead of reintroducing old compatibility behavior.

Constraint: Nemo Gym distillation is now upstream and must stay out of this branch.

Rejected: Restore DISABLE_MODELOPT_LAYER_SPEC test coverage | the branch now uses policy.disable_modelopt_layer_spec and get_quantization_* helpers.

Confidence: high

Scope-risk: narrow

Tested: Static recipe/script/suite accounting check; TEST_DRYRUN=1 for Nano3 distillation and Qwen3 MoE GRPO scripts; targeted Ruff on changed Python files; git diff --check.

Not-tested: Full pytest unit harness in this local env; Ray worker registration failed during autouse unit fixture startup.
Signed-off-by: Meng Xin <mxin@nvidia.com>
NeMo Gym startup moved from actor construction plus health_check to an explicit _spinup method when GRPO started overlapping Gym startup with vLLM load. The distillation entry point still used the older copied health_check call, so current main fails once it reaches Gym actor initialization.

Constraint: Upstream NemoGym no longer exposes health_check.

Rejected: Reintroduce health_check on the actor | would preserve a stale API instead of matching GRPO and existing Nemo Gym tests.

Confidence: high

Scope-risk: narrow

Tested: uv run --no-sync python -m py_compile examples/nemo_gym/run_distillation_nemo_gym.py

Not-tested: Full Nano3 nightly completion blocked by repeated Slurm node XID/NVLink failures.
Signed-off-by: Meng Xin <mxin@nvidia.com>
@mxinO

mxinO commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mxinO

mxinO commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants