-
Notifications
You must be signed in to change notification settings - Fork 245
feat: enforce monotonicity config option #1840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Christian Munley <cmunley@nvidia.com>
📝 WalkthroughWalkthroughAdded a new optional configuration field Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@nemo_rl/models/generation/vllm/config.py`:
- Around line 42-45: Update the TypedDict field comment for enforce_monotonicity
in nemo_rl/models/generation/vllm/config.py to state that it's a boolean
(True/False), describe its purpose succinctly, and specify the recommended
default (e.g., False); then add the enforce_monotonicity: false entry to the
exemplar YAMLs in examples/configs (notably the grpo_math_*.yaml variants and
dpo.yaml) so the exemplar configs reflect the documented default. Ensure the
comment mentions valid values (True/False) and the recommended default, and that
the YAML keys use the same name enforce_monotonicity with the recommended value.
In `@nemo_rl/models/generation/vllm/vllm_worker_async.py`:
- Line 333: The line that reads enforce_monotonicity =
self.cfg["vllm_cfg"].get("enforce_monotonicity", True) should not provide a
code-side default; instead read the value directly from the config and fail fast
if missing. Replace the .get call with direct indexing (e.g.,
self.cfg["vllm_cfg"]["enforce_monotonicity"]) or add an explicit assertion that
"enforce_monotonicity" exists in self.cfg["vllm_cfg"], and ensure the YAML
config defines the default there.
| # Ensures message history over multiple turns remains constant. Useful for on-policy training, but sometimes | ||
| # we may want to drop previous reasoning (like Qwen3), agent context management, | ||
| # sliding window, or drop past environment state observations, etc | ||
| enforce_monotonicity: NotRequired[bool] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for enforce_monotonicity in exemplar YAMLs
rg -n 'enforce_monotonicity' examples/ -g '*.yaml' -g '*.yml'Repository: NVIDIA-NeMo/RL
Length of output: 40
🏁 Script executed:
#!/bin/bash
# Find exemplar YAML files in the repo to understand structure
find examples/configs -type f \( -name "*.yaml" -o -name "*.yml" \) 2>/dev/null | head -20Repository: NVIDIA-NeMo/RL
Length of output: 856
🏁 Script executed:
#!/bin/bash
# Check the config file to see how other NotRequired config keys are documented
cat -n nemo_rl/models/generation/vllm/config.py | head -80Repository: NVIDIA-NeMo/RL
Length of output: 3068
Add enforce_monotonicity to exemplar YAMLs and document valid values and recommended default.
The new config key is missing from exemplar YAML files under examples/configs/ and its documentation is incomplete. Per coding guidelines, new TypedDict config keys must document purpose, valid values/types, and recommended default, and this default must be reflected in exemplar YAMLs.
Update the field comment to specify valid values (True/False) and recommended default, then add the key to relevant exemplar YAML files (e.g., grpo_math_*.yaml, dpo.yaml) with the recommended value.
🤖 Prompt for AI Agents
In `@nemo_rl/models/generation/vllm/config.py` around lines 42 - 45, Update the
TypedDict field comment for enforce_monotonicity in
nemo_rl/models/generation/vllm/config.py to state that it's a boolean
(True/False), describe its purpose succinctly, and specify the recommended
default (e.g., False); then add the enforce_monotonicity: false entry to the
exemplar YAMLs in examples/configs (notably the grpo_math_*.yaml variants and
dpo.yaml) so the exemplar configs reflect the documented default. Ensure the
comment mentions valid values (True/False) and the recommended default, and that
the YAML keys use the same name enforce_monotonicity with the recommended value.
| openai_serving_models_kwargs["model_config"] = model_config | ||
| openai_serving_models = OpenAIServingModels(**openai_serving_models_kwargs) | ||
|
|
||
| enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid code-side default; rely on YAML for enforce_monotonicity.
Using .get(..., True) sets a non-None default in code. Please set the default in YAML and read the value directly (or assert presence) here.
✅ Suggested change
- enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True)
+ enforce_monotonicity = self.cfg["vllm_cfg"]["enforce_monotonicity"]As per coding guidelines, "YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| enforce_monotonicity = self.cfg["vllm_cfg"].get("enforce_monotonicity", True) | |
| enforce_monotonicity = self.cfg["vllm_cfg"]["enforce_monotonicity"] |
🤖 Prompt for AI Agents
In `@nemo_rl/models/generation/vllm/vllm_worker_async.py` at line 333, The line
that reads enforce_monotonicity =
self.cfg["vllm_cfg"].get("enforce_monotonicity", True) should not provide a
code-side default; instead read the value directly from the config and fail fast
if missing. Replace the .get call with direct indexing (e.g.,
self.cfg["vllm_cfg"]["enforce_monotonicity"]) or add an explicit assertion that
"enforce_monotonicity" exists in self.cfg["vllm_cfg"], and ensure the YAML
config defines the default there.
|
i forgot this will still fail here and need to consider what to do in this case RL/nemo_rl/environments/nemo_gym.py Line 166 in e4d94da
|
What does this PR do ?
allow disabling monotonic / on policy checks for instance, qwen3 reasoning models chat template drop past thinking traces, or agents with context management.
this may only apply to nemo gym path, not entirely sure.
should probably add config example and test?
Issues
#1812
Usage
+generation.vllm_cfg.enforce_monotonicity=FalseBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.