fix: fix and re-enable rm env functional test #1905

RayenTian · 2026-02-10T05:10:43Z

PR: Enhance GRPO setup to support reward model environment configuration

Summary

This PR fixes how GRPO determines whether the reward model environment is used, so that cluster resource setup (GPU/node allocation for policy, inference, and reward model) is correct when train or validation data uses the reward_model env.

Previously, reward-model usage was inferred inside grpo.setup() from data_config["env_name"] == "reward_model", which only reflects the default env and can miss cases where the reward model env is used only in validation or in a non-default task. This change moves the detection to the entrypoint using the actual env names required by the data config, and passes a single flag into setup().

Motivation

Correctness: When using data.default.env_name: "reward_model" or any train/validation task that uses the reward model env, the cluster must reserve GPUs/nodes for the reward model. Relying only on data_config["env_name"] can be wrong when:
- Validation uses the reward model env but the default env is something else, or
- Multiple datasets/tasks are used and the default does not reflect all required envs.
Single source of truth: Env names are already computed in extract_necessary_env_names(config["data"]). Using that for “is reward model env needed?” avoids duplicating logic and keeps behavior aligned with what the data pipeline actually uses.

Changes

1. `examples/run_grpo.py`

Import extract_necessary_env_names from nemo_rl.data.datasets.
After setup_data_with_envs(), compute:
- env_name_list = extract_necessary_env_names(config["data"])
- rm_env_enabled = "reward_model" in env_name_list
Pass rm_env_enabled=rm_env_enabled into setup().

2. `nemo_rl/algorithms/grpo.py`

Add a new parameter to setup(): rm_env_enabled: bool = False.
Remove the previous logic that set reward_model_enabled from data_config["env_name"].
Use rm_env_enabled everywhere the previous reward_model_enabled was used for:
- Reward model resource (nodes, GPUs per node) and cluster sizing.
- Non-colocated inference path: subtracting reward model GPUs when total_nodes == 1 and when asserting train_gpus_per_node > 0.

Behavior of cluster setup (colocated vs non-colocated, single vs multi-node, reward model reservation) is unchanged except that the “reward model env in use” flag is now derived from the data config’s required env names instead of the default env name only.

3. `tests/functional/L1_Functional_Tests_GPU.sh`

Add a run of the reward-model env functional test:
time uv run --no-sync bash ./tests/functional/grpo_rm.sh
so that the reward model env path is exercised in L1 GPU tests.

Testing

Run the existing reward-model env functional test:
bash tests/functional/grpo_rm_env.sh
(or grpo_rm.sh as wired in L1).
Manually run GRPO with a config that uses the reward model env for train and/or validation and confirm cluster setup and run complete (e.g. no placement group or resource assertion errors).

Summary by CodeRabbit

Improvements
- Enhanced flexibility in reward-model environment configuration detection and resource allocation
Tests
- Added functional test coverage for reward-model environment scenarios

Signed-off-by: ruit <ruit@nvidia.com>

coderabbitai · 2026-02-11T02:38:32Z

📝 Walkthrough

Walkthrough

Replaces hard-coded reward-model environment checks with dynamic extraction-based detection in the GRPO algorithm, enabling flexible resource allocation based on configured environment names instead of direct string matching. Also adds a new test invocation for reward-model environment scenarios.

Changes

Cohort / File(s)	Summary
GRPO Algorithm Logic `nemo_rl/algorithms/grpo.py`	Refactors reward-model detection from hard-coded `env_name == "reward_model"` check to dynamic extraction via `extract_necessary_env_names(data_config)`. Introduces `rm_env_enabled` flag to gate resource allocation (GPUs/nodes) and adds default zero-resource path when reward-model environment is not active.
GPU Functional Tests `tests/functional/L1_Functional_Tests_GPU.sh`	Adds new test invocation for `grpo_rm_env.sh` in the GPU test sequence, positioned after `grpo_non_colocated.sh`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

yuki-97
terrykong
joyang-nv

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: fix and re-enable rm env functional test' accurately describes the main changes: fixing the reward-model environment detection logic and re-enabling the functional test for it.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes	✅ Passed	PR documentation includes testing instructions (grpo_rm_env.sh) and manual validation steps for the bug fix changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ruit/rm_env_func

No actionable comments were generated in the recent review. 🎉

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

RayenTian force-pushed the ruit/rm_env_func branch from 16a866a to e322488 Compare February 10, 2026 05:20

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 10, 2026

RayenTian temporarily deployed to nemo-ci February 10, 2026 05:21 — with GitHub Actions Inactive

yuki-97 mentioned this pull request Feb 10, 2026

fix: add missing functional test #1883

Merged

RayenTian added 2 commits February 10, 2026 18:25

fix:Enhance GRPO setup to support reward model environment configuration

043673a

Signed-off-by: ruit <ruit@nvidia.com>

update functional test name

55f35a4

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian force-pushed the ruit/rm_env_func branch from dc81561 to 55f35a4 Compare February 11, 2026 02:25

RayenTian marked this pull request as ready for review February 11, 2026 02:27

RayenTian requested review from a team as code owners February 11, 2026 02:27

RayenTian requested review from terrykong and yuki-97 February 11, 2026 02:28

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 11, 2026

RayenTian temporarily deployed to nemo-ci February 11, 2026 02:30 — with GitHub Actions Inactive

RayenTian deployed to nemo-ci February 11, 2026 04:17 — with GitHub Actions Active

RayenTian requested a deployment to nemo-ci February 11, 2026 06:11 — with GitHub Actions In progress

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix and re-enable rm env functional test #1905

fix: fix and re-enable rm env functional test #1905

RayenTian commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 11, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: fix and re-enable rm env functional test #1905

Are you sure you want to change the base?

fix: fix and re-enable rm env functional test #1905

Conversation

RayenTian commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR: Enhance GRPO setup to support reward model environment configuration

Summary

Motivation

Changes

1. examples/run_grpo.py

2. nemo_rl/algorithms/grpo.py

3. tests/functional/L1_Functional_Tests_GPU.sh

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 11, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RayenTian commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

1. `examples/run_grpo.py`

2. `nemo_rl/algorithms/grpo.py`

3. `tests/functional/L1_Functional_Tests_GPU.sh`