feat: enforce monotonicity config option #1840

cmunley1 · 2026-01-29T04:04:25Z

What does this PR do ?

allow disabling monotonic / on policy checks for instance, qwen3 reasoning models chat template drop past thinking traces, or agents with context management.

this may only apply to nemo gym path, not entirely sure.

should probably add config example and test?

Issues

#1812

Usage

+generation.vllm_cfg.enforce_monotonicity=False

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Release Notes

New Features
- Added configuration option to control message history consistency in multi-turn conversations, enabling flexibility for training scenarios where historical context may be discarded.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Christian Munley <cmunley@nvidia.com>

coderabbitai · 2026-01-29T04:11:18Z

📝 Walkthrough

Walkthrough

Added a new optional configuration field enforce_monotonicity to the VLLM configuration to control whether message history monotonicity is enforced during preprocessing. When disabled, the original preprocessing result is returned without monotonicity-specific processing; when enabled (default), existing behavior is preserved.

Changes

Cohort / File(s)	Summary
Configuration Field `nemo_rl/models/generation/vllm/config.py`	Added optional `enforce_monotonicity: NotRequired[bool]` field to `VllmSpecificArgs` TypedDict to document and control message history monotonicity enforcement.
Preprocessing Logic `nemo_rl/models/generation/vllm/vllm_worker_async.py`	Introduced `enforce_monotonicity` flag retrieval from vllm config and added conditional early return in `_preprocess_chat` to bypass monotonicity processing when flag is False.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR introduces major behavioral feature controlling message history monotonicity enforcement but lacks test files, validation data, and documented testing—PR objectives explicitly acknowledge incomplete testing.	Create tests for enforce_monotonicity feature, run regression tests, document results in PR description, provide validation evidence, and mark testing checklist complete before review.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically describes the main change: adding a configuration option to control monotonicity enforcement.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Christian Munley <cmunley@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@nemo_rl/models/generation/vllm/config.py`:
- Around line 42-45: Update the TypedDict field comment for enforce_monotonicity
in nemo_rl/models/generation/vllm/config.py to state that it's a boolean
(True/False), describe its purpose succinctly, and specify the recommended
default (e.g., False); then add the enforce_monotonicity: false entry to the
exemplar YAMLs in examples/configs (notably the grpo_math_*.yaml variants and
dpo.yaml) so the exemplar configs reflect the documented default. Ensure the
comment mentions valid values (True/False) and the recommended default, and that
the YAML keys use the same name enforce_monotonicity with the recommended value.

In `@nemo_rl/models/generation/vllm/vllm_worker_async.py`:
- Line 333: The line that reads enforce_monotonicity =
self.cfg["vllm_cfg"].get("enforce_monotonicity", True) should not provide a
code-side default; instead read the value directly from the config and fail fast
if missing. Replace the .get call with direct indexing (e.g.,
self.cfg["vllm_cfg"]["enforce_monotonicity"]) or add an explicit assertion that
"enforce_monotonicity" exists in self.cfg["vllm_cfg"], and ensure the YAML
config defines the default there.

coderabbitai · 2026-01-29T04:11:21Z

nemo_rl/models/generation/vllm/config.py

+    # Ensures message history over multiple turns remains constant. Useful for on-policy training, but sometimes
+    # we may want to drop previous reasoning (like Qwen3), agent context management, 
+    # sliding window, or drop past environment state observations, etc
+    enforce_monotonicity: NotRequired[bool]


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for enforce_monotonicity in exemplar YAMLs rg -n 'enforce_monotonicity' examples/ -g '*.yaml' -g '*.yml'

Repository: NVIDIA-NeMo/RL

Length of output: 40

🏁 Script executed:

#!/bin/bash # Find exemplar YAML files in the repo to understand structure find examples/configs -type f \( -name "*.yaml" -o -name "*.yml" \) 2>/dev/null | head -20

Repository: NVIDIA-NeMo/RL

Length of output: 856

🏁 Script executed:

#!/bin/bash # Check the config file to see how other NotRequired config keys are documented cat -n nemo_rl/models/generation/vllm/config.py | head -80

Repository: NVIDIA-NeMo/RL

Length of output: 3068

Add enforce_monotonicity to exemplar YAMLs and document valid values and recommended default.

The new config key is missing from exemplar YAML files under examples/configs/ and its documentation is incomplete. Per coding guidelines, new TypedDict config keys must document purpose, valid values/types, and recommended default, and this default must be reflected in exemplar YAMLs.

Update the field comment to specify valid values (True/False) and recommended default, then add the key to relevant exemplar YAML files (e.g., grpo_math_*.yaml, dpo.yaml) with the recommended value.

🤖 Prompt for AI Agents

In `@nemo_rl/models/generation/vllm/config.py` around lines 42 - 45, Update the TypedDict field comment for enforce_monotonicity in nemo_rl/models/generation/vllm/config.py to state that it's a boolean (True/False), describe its purpose succinctly, and specify the recommended default (e.g., False); then add the enforce_monotonicity: false entry to the exemplar YAMLs in examples/configs (notably the grpo_math_*.yaml variants and dpo.yaml) so the exemplar configs reflect the documented default. Ensure the comment mentions valid values (True/False) and the recommended default, and that the YAML keys use the same name enforce_monotonicity with the recommended value.

coderabbitai · 2026-01-29T04:11:21Z

nemo_rl/models/generation/vllm/vllm_worker_async.py

            openai_serving_models_kwargs["model_config"] = model_config
        openai_serving_models = OpenAIServingModels(**openai_serving_models_kwargs)

+        enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True)


⚠️ Potential issue | 🟠 Major

Avoid code-side default; rely on YAML for enforce_monotonicity.

Using .get(..., True) sets a non-None default in code. Please set the default in YAML and read the value directly (or assert presence) here.

✅ Suggested change

- enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True) + enforce_monotonicity = self.cfg["vllm_cfg"]["enforce_monotonicity"]

As per coding guidelines, "YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True)

enforce_monotonicity = self.cfg["vllm_cfg"]["enforce_monotonicity"]

🤖 Prompt for AI Agents

In `@nemo_rl/models/generation/vllm/vllm_worker_async.py` at line 333, The line that reads enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True) should not provide a code-side default; instead read the value directly from the config and fail fast if missing. Replace the .get call with direct indexing (e.g., self.cfg["vllm_cfg"]["enforce_monotonicity"]) or add an explicit assertion that "enforce_monotonicity" exists in self.cfg["vllm_cfg"], and ensure the YAML config defines the default there.

cmunley1 · 2026-01-29T04:55:05Z

i forgot this will still fail here and need to consider what to do in this case

RL/nemo_rl/environments/nemo_gym.py

Line 166 in e4d94da

assert (

enforce monotonicity

c4e76ff

Signed-off-by: Christian Munley <cmunley@nvidia.com>

cmunley1 requested a review from a team as a code owner January 29, 2026 04:04

cmunley1 mentioned this pull request Jan 29, 2026

docs: on policy training NVIDIA-NeMo/Gym#613

Open

lint

872380b

Signed-off-by: Christian Munley <cmunley@nvidia.com>

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enforce monotonicity config option #1840

feat: enforce monotonicity config option #1840

Uh oh!

cmunley1 commented Jan 29, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 29, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

cmunley1 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True)
	enforce_monotonicity = self.cfg["vllm_cfg"]["enforce_monotonicity"]

feat: enforce monotonicity config option #1840

Are you sure you want to change the base?

feat: enforce monotonicity config option #1840

Uh oh!

Conversation

cmunley1 commented Jan 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 29, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

cmunley1 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cmunley1 commented Jan 29, 2026 •

edited by coderabbitai bot

Loading