OH metrics and block commands by sdevare-nv · Pull Request #598 · NVIDIA-NeMo/Gym

sdevare-nv · 2026-01-22T02:04:04Z

No description provided.

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add copy-pr-bot

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add initial repo template

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>

Migrated over from gitlab: - Display aggregate metrics - Aggregate generic keys using multineedle - Display other dynamic aggregations - Count string totals and unique values - Remove TrainDataProcessor dependency, add test - Remove dupe file read, fix arg types hints --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>

updated the following logging print when running ng_prepare_data from, for example: "Found 0 agent server instance configs withOUT datasets:" to "Found 0 agent server instance configs WITHOUT datasets:" to match the format of the subsequent logs, for example: "Found 1 agent server instance configs WITH datasets:" Signed-off-by: chrismun <cmunley@nvidia.com>

update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>

This PR enables running Gym on Aviary environments. The two main concepts: - `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and manages multiple environments - Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs, but an integer index into the `TaskDataset`. Otherwise we'd have data defined in two places - Instead of tool-specific endpoints, we have one `/step` endpoint. This is because: - Aviary environments define their transition function in `step()`. Simply calling the bare tools can have undefined behavior (e.g. state isn't updated properly) - Aviary tools are not guaranteed to be available until `reset()` is called. - A `/close` endpoint is added to tear down resources - `AviaryAgent`: analogous to `SimpleAgent`, but: - Request is an integer index (which is forwarded to `AviaryResourcesServer`). In general, we expect `env.reset()` to provide the first messages, not the calling code - All tool calls are sent to `/step` - We rely on the environment to tell us when we're done Two concrete Aviary datasets/environments are integrated: GSM8k with a calculator environment and BixBench with a notebook environment. Adding new ones is pretty lightweight (most of the code in `notebook_app.py` is from defining a BixBench-compatible environment, not the integration). --------- Signed-off-by: Siddharth Narayanan <sid@futurehouse.org> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com> Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Adds more descriptive readme, reward profiling, and option for fractional or binary reward. Signed-off-by: abukharin-nv <abukharin@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>

This PR adds new environments for SWE tasks. The environments can be used for single-step patch generation, test generation, and LLM-as-a-judge. They have been tested for instances from SWE-bench, SWE-Gym, and SWE-rebench. Patch and test generation environment runs them against unittests in a containerized environment (Singularity). --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com>

Integrating a new dataset using existing equivalency llm judge resource server. Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash License: https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE Train: 8040 unique samples Validation: 50 unique, randomly sampled from train Augmentation on the source (minimal): Added system prompt, output formatting requirement Example of env validation: - base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint) - Step 30 -> 12.50% on Terminal Bench Core - https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm Train: nl2bash-super-train-0901.jsonl Validation: nl2bash-super-validation-0901.jsonl https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/ ``` ng_download_dataset_from_gitlab \ +dataset_name=nl2bash-equivalency-judge \ +version=0.0.1 \ +artifact_fpath=nl2bash-super-train-0901.jsonl \ +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl ``` --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

# Make `agent_name` optional in CLI rollout collection ## Summary Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to use `agent_ref` from each data row instead. ## Motivation The NeMo-RL training code already respects per-row `agent_ref`, but the Gym CLI (`ng_collect_rollouts`) required a single hardcoded `agent_name`. This prevented multi-agent rollout collection via CLI. ## Changes - `rollout_collection.py`: Made `agent_name` field optional with `default=None` - Use `config.agent_name` if specified; otherwise fall back to `row["agent_ref"]["name"]` - Added validation error if neither source provides an agent name ## Behavior | Before | After | |--------|-------| | `+agent_name=...` required | `+agent_name=...` optional | | All rows use same agent | Rows can use different agents via `agent_ref` | --------- Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

@pjin-nvidia

Inspired by https://github.com/NVIDIA-NeMo/Gym/pull/318/files#diff-b56c7f31b7793b3a4ac265f84f4c84216f1ed15a3fbee855da9674a7da8714ff by @pjin-nvidia --------- Signed-off-by: Brian Yu <bxyu@nvidia.com>

The default artifact paths for the math_with_judge resource server doesn't match the filenames for the provided dataset (nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main). This results in an error when attempting to download the files automatically from Hugging Face. The artifact paths for both training and validation need to be updated with the names as shown on Hugging Face for proper downloading. Signed-off-by: Robert Clark <roclark@nvidia.com>

The competitive coding resource config is missing a Hugging Face identifier which prevents it from being downloaded via Hugging Face using the data preparation tools. Without the HF identifier run the following: ``` config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml" ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface ``` This will throw a warning: ``` Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend ``` And eventually this error: ``` Traceback (most recent call last): File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module> sys.exit(prepare_data()) ^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data data_processor.run(global_config_dict) File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics state = self._validate_samples_and_aggregate_metrics_single_dataset(d) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines with open(dataset_config.jsonl_fpath) as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl' ``` This fix will download the validation file as intended and resolve the errors. Signed-off-by: Robert Clark <roclark@nvidia.com>

The train and val data paths are swapped in the config. This PR updates them. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com>

# PR: Add ns_tools Resources Server ## Description Adds a new resources server that integrates NeMo Skills tools (e.g., stateful Python code execution) with NeMo Gym's verification system. **Key features:** - Executes NeMo Skills tools via the ToolManager (e.g., `stateful_python_code_exec`) - Delegates verification to other resources servers (e.g., `math_with_judge`) ## Verifier Delegation The `ns_tools` server acts as a pass-through for verification. When `verify()` is called, it delegates to the configured verifier (default: `math_with_judge`): ``` ns_tools.verify(request) → POST to math_with_judge/verify → returns reward from math_with_judge ``` This allows using NeMo Skills tools while leveraging existing verification infrastructure. ## Example Data Format ```json { "id": "aime25-0", "question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.", "expected_answer": "70", "verifier_type": "math_with_judge", "agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"}, "responses_create_params": { "input": [ {"role": "user", "content": "Solve the following math problem..."} ], "tools": [{ "type": "function", "name": "stateful_python_code_exec", "description": "Execute Python code in a stateful environment.", "parameters": { "type": "object", "properties": {"code": {"type": "string"}}, "required": ["code"] } }] } } ``` --------- Signed-off-by: George Armstrong <georgea@nvidia.com>

## Summary - Adds new `math_formal_lean` resource server for Lean4 formal theorem proving - Implements `/verify` endpoint that compiles proofs via sandbox container and returns reward 1.0/0.0 - Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned prompt format - Comprehensive test suite (31 tests) ## Components | File | Description | |------|-------------| | `app.py` | Resource server with verify endpoint | | `sandbox_client.py` | HTTP client for Lean4 sandbox | | `proof_utils.py` | Proof extraction/building utilities | | `prepare_minif2f.py` | Dataset preparation script | | `README.md` | Documentation with licensing info | ## Test plan - [x] Unit tests pass (31/31) - [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5 samples) - [x] Tested with gpt-5.1-codex-max model - [x] Pre-commit lint checks pass 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Per title. This PR retains the current default of returning transitions, but it is reasonable to change that default to match the other Gym agents. Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>

Refactoring the equivalency llm judge resource server into another judge-based resource server. Main changes include removing regex logic and cleaning up related configs to that. Train data for this environment is still TBD, but a working version: Data source: Sliced terminus prompts from different sources train_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl` validation_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl` example train config: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml` Example of env validation: base model: early sft checkpoint of nano v3 (`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`) Step 50 -> 21.25% on Terminal Bench Core https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi Next steps: Will expand this PR with configurable verification options including string matching, string similarity and openapi-based output schema validation. --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

Added new doc directories/article stubs for the topics identified in 0.2.0 IA. generated initial pass of structure and some starter content. This will enable contributors to focus more on the topic itself rather than the site build/toctree elements. **Feel free to blow away any initial content in these pages**. All stubbed pages have been marked with 🟡 in the toctree for easy discovery. remove 🟡 once the page is finished. <img width="1800" height="1009" alt="image" src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db" /> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com>

Added a complete example of preparing a custom dataset for usage with NeMo Gym. The tutorial walks through downloading a dataset from Hugging Face or modifying from a different source, adding the "responses_create_params" field, writing a new resource server config, and preparing the data with "ng_prepare_data". This tutorial can be used as a guide for taking most arbitrary text-based datasets and modifying them to a format that is compatible with NeMo Gym for post-training. Signed-off-by: Robert Clark <roclark@nvidia.com>

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

copy-pr-bot · 2026-01-22T02:04:07Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

copy-pr-bot · 2026-03-10T23:08:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

chtruong814 and others added 30 commits August 25, 2025 16:39

Initial commit

51cc441

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add copy-pr-bot

7625f00

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add initial repo template

cd96ed4

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Merge pull request #1 from NVIDIA-NeMo/chtruong/copy-pr-bot

d0c0cac

Add copy-pr-bot

Merge remote-tracking branch 'origin/main' into chtruong/template

9b1afdc

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add docstring parser

4847931

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Fix docs build

817d85e

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Fix secret detector

831d980

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Change to use cpu runner for build

e89b476

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Fix initial test

b5b5980

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Use uv

2acdd68

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Remove e2e coverage

cdea8b7

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Merge pull request #2 from NVIDIA-NeMo/chtruong/template

ced800a

Add initial repo template

Update GitHub with Gitlab main (#3)

9501fdf

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Alias as Penguin (#4)

a6cd962

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Add Copyright docs README FAQ (#7)

d292374

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Dapo17k (#6)

7ebdbd2

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix docs build failures (#8)

10e2971

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix docs (#10)

f2e5eb9

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Improve Github SSH Key setup docs (#12)

8c753d1

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Comp-Coding Verifier (#5)

48212f8

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>

VLLMModel docs in main Readme (#13)

0d4eb58

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix agent name in docs (#15)

0d2cf31

Signed-off-by: Brian Yu <bxyu@nvidia.com>

VLLMModel propogates token IDs (#11)

e5c2afd

Signed-off-by: Brian Yu <bxyu@nvidia.com>

VLLMModel tokenize params cleanup (#21)

90ca6a9

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Update Comp-Coding README.md (#26)

0ebd762

Docs improvements - remove Why NeMo Gym section and add CI/CD tests i…

b590d40

…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>

update readmes from ng_collect_traj to ng_collect_rollouts (#25)

323b66c

update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>

sidnarayanan and others added 19 commits January 8, 2026 10:35

ng_dump_config sanity removes API key values (#567)

840e177

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Feat: Add reward profiling and fractional reward (#83)

07b84f2

Adds more descriptive readme, reward profiling, and option for fractional or binary reward. Signed-off-by: abukharin-nv <abukharin@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>

FastAPI worker support (#566)

927cb5f

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Local vLLM model and other misc improvements (#558)

fe4aa05

Inspired by https://github.com/NVIDIA-NeMo/Gym/pull/318/files#diff-b56c7f31b7793b3a4ac265f84f4c84216f1ed15a3fbee855da9674a7da8714ff by @pjin-nvidia --------- Signed-off-by: Brian Yu <bxyu@nvidia.com>

updating swerl_gen config (#588)

08700af

The train and val data paths are swapped in the config. This PR updates them. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com>

Aviary rollouts can be configured to return transitions or not (#590)

a94d52d

Per title. This PR retains the current default of returning transitions, but it is reasonable to change that default to match the other Gym agents. Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>

openhands (#343)

3cfd40e

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>

feat: oh metrics block commands

735d17e

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

sdevare-nv added 4 commits January 23, 2026 15:51

feat: update oh w/ mem limt and cmd timeout

cb4192a

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

feat: move copy logic to host

d832d59

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

feat: update oh, fix reward calculation

37db14a

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

feat: add reward guardrail

d900c18

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

bxyu-nvidia marked this pull request as ready for review February 5, 2026 00:25

sdevare-nv added 2 commits February 6, 2026 10:06

feat: pin r2e python version

0058581

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

fix: packaging

35241e7

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

vadam5 closed this Mar 10, 2026

vadam5 force-pushed the sdd/oh-metric-block-commands branch from bafcbb3 to 35241e7 Compare March 10, 2026 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OH metrics and block commands#598

OH metrics and block commands#598
sdevare-nv wants to merge 230 commits intomainfrom
sdd/oh-metric-block-commands

sdevare-nv commented Jan 22, 2026

Uh oh!

copy-pr-bot Bot commented Jan 22, 2026

Uh oh!

copy-pr-bot Bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

sdevare-nv commented Jan 22, 2026

Uh oh!

copy-pr-bot Bot commented Jan 22, 2026

Uh oh!

copy-pr-bot Bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants