Add Word-guess NemoGym GRPO training #1903
Draft
+684
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dependencies
Bug Fix: Token alignment with reasoning-stripping chat templates
Models like Nemotron Nano 9B v2 have chat templates that strip ... from prior assistant messages when re-rendering for subsequent turns. This causes two assertion failures during NemoGym multi-turn
tool-calling training:
Fix: Fall back to template token IDs when alignment fails instead of crashing. Note: this causes token duplication in affected samples, which may slightly impact training quality. The proper fix would be to strip
thinking tokens from generation_token_ids before recording, matching what the template does on re-render.
Training Setup
Generate Wordle data (in Gym repo)
Run training