Skip to content

Conversation

@yyu22
Copy link

@yyu22 yyu22 commented Feb 9, 2026

  • Add NemoGym Wordle GRPO training config (grpo_wordle_nemotron_nano_v2_9b.yaml)
  • Add Nemotron JSON tool call parser (nemotron_json_tool_parser.py)
  • Fix _replace_prefix_tokens and _postprocess_nemo_gym_to_nemo_rl_result crashes when chat templates strip reasoning tokens from prior assistant messages

Dependencies

Bug Fix: Token alignment with reasoning-stripping chat templates

Models like Nemotron Nano 9B v2 have chat templates that strip ... from prior assistant messages when re-rendering for subsequent turns. This causes two assertion failures during NemoGym multi-turn
tool-calling training:

  1. _replace_prefix_tokens (vllm_worker_async.py) — template drops the last assistant message when content is empty after stripping, causing len(template_token_ids) <= len(template_prefix_token_ids) or missing EOS
  2. _postprocess_nemo_gym_to_nemo_rl_result (nemo_gym.py) — token contiguity check fails because generation_token_ids includes thinking tokens but re-tokenized prompts don't

Fix: Fall back to template token IDs when alignment fails instead of crashing. Note: this causes token duplication in affected samples, which may slightly impact training quality. The proper fix would be to strip
thinking tokens from generation_token_ids before recording, matching what the template does on re-render.

Training Setup

Generate Wordle data (in Gym repo)

  cd 3rdparty/Gym-workspace/Gym/resources_servers/wordle
  python generate_data.py --output_dir data/

Run training

  uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
      --config examples/nemo_gym/grpo_wordle_nemotron_nano_v2_9b.yaml

yyu22 added 2 commits February 9, 2026 09:26
… for reasoning models

- Add grpo_wordle_nemotron_nano_v2_9b.yaml config for NemoGym Wordle training
- Fix _replace_prefix_tokens crash when chat templates strip reasoning tokens
  from prior assistant messages (e.g., Nemotron's <think>...</think> stripping)
- Fix _postprocess_nemo_gym_to_nemo_rl_result contiguity assertion for the same
  reasoning token stripping issue

Signed-off-by: root <[email protected]>
@yyu22 yyu22 changed the title Add Wordle NemoGym GRPO training Add Word-guess NemoGym GRPO training Feb 9, 2026
@cmunley1
Copy link

cmunley1 commented Feb 10, 2026

related: #1812

id like to disable forcing token-level on policy eg for agents with context mgmt, but feels like we shouldnt just quietly fallback to off policy, it should be a cfg option at least

i think rather than disabling asserts for replace prefix tokens we should just do this for now https://docs.nvidia.com/nemo/gym/latest/tutorials/nemo-rl-grpo/single-node-training.html#configure-the-chat-template

until we test disabling this thorough and add a cfg

Revert workaround changes to nemo_gym.py and vllm_worker_async.py.
Instead, add a custom Nemotron chat template that preserves <think>
tokens in prior assistant messages (no stripping), which keeps token
alignment consistent across turns for _replace_prefix_tokens.

Signed-off-by: root <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants