Skip to content

Conversation

@cmunley1
Copy link

@cmunley1 cmunley1 commented Jan 29, 2026

What does this PR do ?

allow disabling monotonic / on policy checks for instance, qwen3 reasoning models chat template drop past thinking traces, or agents with context management.

this may only apply to nemo gym path, not entirely sure.

should probably add config example and test?

Issues

#1812

Usage

+generation.vllm_cfg.enforce_monotonicity=False

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • New Features
    • Added configuration option to control message history consistency in multi-turn conversations, enabling flexibility for training scenarios where historical context may be discarded.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Christian Munley <cmunley@nvidia.com>
@cmunley1 cmunley1 requested a review from a team as a code owner January 29, 2026 04:04
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Added a new optional configuration field enforce_monotonicity to the VLLM configuration to control whether message history monotonicity is enforced during preprocessing. When disabled, the original preprocessing result is returned without monotonicity-specific processing; when enabled (default), existing behavior is preserved.

Changes

Cohort / File(s) Summary
Configuration Field
nemo_rl/models/generation/vllm/config.py
Added optional enforce_monotonicity: NotRequired[bool] field to VllmSpecificArgs TypedDict to document and control message history monotonicity enforcement.
Preprocessing Logic
nemo_rl/models/generation/vllm/vllm_worker_async.py
Introduced enforce_monotonicity flag retrieval from vllm config and added conditional early return in _preprocess_chat to bypass monotonicity processing when flag is False.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR introduces major behavioral feature controlling message history monotonicity enforcement but lacks test files, validation data, and documented testing—PR objectives explicitly acknowledge incomplete testing. Create tests for enforce_monotonicity feature, run regression tests, document results in PR description, provide validation evidence, and mark testing checklist complete before review.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding a configuration option to control monotonicity enforcement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@nemo_rl/models/generation/vllm/config.py`:
- Around line 42-45: Update the TypedDict field comment for enforce_monotonicity
in nemo_rl/models/generation/vllm/config.py to state that it's a boolean
(True/False), describe its purpose succinctly, and specify the recommended
default (e.g., False); then add the enforce_monotonicity: false entry to the
exemplar YAMLs in examples/configs (notably the grpo_math_*.yaml variants and
dpo.yaml) so the exemplar configs reflect the documented default. Ensure the
comment mentions valid values (True/False) and the recommended default, and that
the YAML keys use the same name enforce_monotonicity with the recommended value.

In `@nemo_rl/models/generation/vllm/vllm_worker_async.py`:
- Line 333: The line that reads enforce_monotonicity =
self.cfg["vllm_cfg"].get("enforce_monotonicity", True) should not provide a
code-side default; instead read the value directly from the config and fail fast
if missing. Replace the .get call with direct indexing (e.g.,
self.cfg["vllm_cfg"]["enforce_monotonicity"]) or add an explicit assertion that
"enforce_monotonicity" exists in self.cfg["vllm_cfg"], and ensure the YAML
config defines the default there.

Comment on lines 42 to 45
# Ensures message history over multiple turns remains constant. Useful for on-policy training, but sometimes
# we may want to drop previous reasoning (like Qwen3), agent context management,
# sliding window, or drop past environment state observations, etc
enforce_monotonicity: NotRequired[bool]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for enforce_monotonicity in exemplar YAMLs
rg -n 'enforce_monotonicity' examples/ -g '*.yaml' -g '*.yml'

Repository: NVIDIA-NeMo/RL

Length of output: 40


🏁 Script executed:

#!/bin/bash
# Find exemplar YAML files in the repo to understand structure
find examples/configs -type f \( -name "*.yaml" -o -name "*.yml" \) 2>/dev/null | head -20

Repository: NVIDIA-NeMo/RL

Length of output: 856


🏁 Script executed:

#!/bin/bash
# Check the config file to see how other NotRequired config keys are documented
cat -n nemo_rl/models/generation/vllm/config.py | head -80

Repository: NVIDIA-NeMo/RL

Length of output: 3068


Add enforce_monotonicity to exemplar YAMLs and document valid values and recommended default.

The new config key is missing from exemplar YAML files under examples/configs/ and its documentation is incomplete. Per coding guidelines, new TypedDict config keys must document purpose, valid values/types, and recommended default, and this default must be reflected in exemplar YAMLs.

Update the field comment to specify valid values (True/False) and recommended default, then add the key to relevant exemplar YAML files (e.g., grpo_math_*.yaml, dpo.yaml) with the recommended value.

🤖 Prompt for AI Agents
In `@nemo_rl/models/generation/vllm/config.py` around lines 42 - 45, Update the
TypedDict field comment for enforce_monotonicity in
nemo_rl/models/generation/vllm/config.py to state that it's a boolean
(True/False), describe its purpose succinctly, and specify the recommended
default (e.g., False); then add the enforce_monotonicity: false entry to the
exemplar YAMLs in examples/configs (notably the grpo_math_*.yaml variants and
dpo.yaml) so the exemplar configs reflect the documented default. Ensure the
comment mentions valid values (True/False) and the recommended default, and that
the YAML keys use the same name enforce_monotonicity with the recommended value.

openai_serving_models_kwargs["model_config"] = model_config
openai_serving_models = OpenAIServingModels(**openai_serving_models_kwargs)

enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid code-side default; rely on YAML for enforce_monotonicity.

Using .get(..., True) sets a non-None default in code. Please set the default in YAML and read the value directly (or assert presence) here.

✅ Suggested change
-        enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True)
+        enforce_monotonicity = self.cfg["vllm_cfg"]["enforce_monotonicity"]

As per coding guidelines, "YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True)
enforce_monotonicity = self.cfg["vllm_cfg"]["enforce_monotonicity"]
🤖 Prompt for AI Agents
In `@nemo_rl/models/generation/vllm/vllm_worker_async.py` at line 333, The line
that reads enforce_monotonicity =
self.cfg["vllm_cfg"].get("enforce_monotonicity", True) should not provide a
code-side default; instead read the value directly from the config and fail fast
if missing. Replace the .get call with direct indexing (e.g.,
self.cfg["vllm_cfg"]["enforce_monotonicity"]) or add an explicit assertion that
"enforce_monotonicity" exists in self.cfg["vllm_cfg"], and ensure the YAML
config defines the default there.

@cmunley1
Copy link
Author

i forgot this will still fail here and need to consider what to do in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant