Skip to content

OH metrics and block commands#598

Closed
sdevare-nv wants to merge 230 commits intomainfrom
sdd/oh-metric-block-commands
Closed

OH metrics and block commands#598
sdevare-nv wants to merge 230 commits intomainfrom
sdd/oh-metric-block-commands

Conversation

@sdevare-nv
Copy link
Copy Markdown
Contributor

No description provided.

chtruong814 and others added 30 commits August 25, 2025 16:39
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Migrated over from gitlab:

- Display aggregate metrics
- Aggregate generic keys using multineedle
- Display other dynamic aggregations
- Count string totals and unique values
- Remove TrainDataProcessor dependency, add test
- Remove dupe file read, fix arg types hints

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
…nfo (#27)

Signed-off-by: Brian Yu <bxyu@nvidia.com>
updated the following logging print when running ng_prepare_data from,
for example:

"Found 0 agent server instance configs withOUT datasets:"

to 

"Found 0 agent server instance configs WITHOUT datasets:" 

to match the format of the subsequent logs, for example: 
"Found 1 agent server instance configs WITH datasets:"

Signed-off-by: chrismun <cmunley@nvidia.com>
update readme for resources servers for updated cli

Signed-off-by: chrismun <cmunley@nvidia.com>
sidnarayanan and others added 19 commits January 8, 2026 10:35
This PR enables running Gym on Aviary environments. The two main
concepts:

- `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and
manages multiple environments
- Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs,
but an integer index into the `TaskDataset`. Otherwise we'd have data
defined in two places
- Instead of tool-specific endpoints, we have one `/step` endpoint. This
is because:
- Aviary environments define their transition function in `step()`.
Simply calling the bare tools can have undefined behavior (e.g. state
isn't updated properly)
- Aviary tools are not guaranteed to be available until `reset()` is
called.
  - A `/close` endpoint is added to tear down resources
- `AviaryAgent`: analogous to `SimpleAgent`, but:
- Request is an integer index (which is forwarded to
`AviaryResourcesServer`). In general, we expect `env.reset()` to provide
the first messages, not the calling code
  - All tool calls are sent to `/step`
  - We rely on the environment to tell us when we're done 


Two concrete Aviary datasets/environments are integrated: GSM8k with a
calculator environment and BixBench with a notebook environment. Adding
new ones is pretty lightweight (most of the code in `notebook_app.py` is
from defining a BixBench-compatible environment, not the integration).

---------

Signed-off-by: Siddharth Narayanan <sid@futurehouse.org>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com>
Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Adds more descriptive readme, reward profiling, and option for
fractional or binary reward.

Signed-off-by: abukharin-nv <abukharin@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>
This PR adds new environments for SWE tasks. The environments can be
used for single-step patch generation, test generation, and
LLM-as-a-judge. They have been tested for instances from SWE-bench,
SWE-Gym, and SWE-rebench. Patch and test generation environment runs
them against unittests in a containerized environment (Singularity).

---------

Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com>
Co-authored-by: Test User <test@example.com>
Integrating a new dataset using existing equivalency llm judge resource
server.

Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash
License:
https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE
Train: 8040 unique samples 
Validation: 50 unique, randomly sampled from train
Augmentation on the source (minimal): Added system prompt, output
formatting requirement

Example of env validation:
- base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint) 
- Step 30 -> 12.50% on Terminal Bench Core 
- https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm

Train:  nl2bash-super-train-0901.jsonl
Validation:  nl2bash-super-validation-0901.jsonl

https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/
```
ng_download_dataset_from_gitlab \
    +dataset_name=nl2bash-equivalency-judge \
    +version=0.0.1 \
    +artifact_fpath=nl2bash-super-train-0901.jsonl \
    +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl
```

---------

Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
# Make `agent_name` optional in CLI rollout collection

## Summary

Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to
use `agent_ref` from each data row instead.

## Motivation

The NeMo-RL training code already respects per-row `agent_ref`, but the
Gym CLI (`ng_collect_rollouts`) required a single hardcoded
`agent_name`. This prevented multi-agent rollout collection via CLI.

## Changes

- `rollout_collection.py`: Made `agent_name` field optional with
`default=None`
- Use `config.agent_name` if specified; otherwise fall back to
`row["agent_ref"]["name"]`
- Added validation error if neither source provides an agent name

## Behavior

| Before | After |
|--------|-------|
| `+agent_name=...` required | `+agent_name=...` optional |
| All rows use same agent | Rows can use different agents via
`agent_ref` |

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
The default artifact paths for the math_with_judge resource server
doesn't match the filenames for the provided dataset
(nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging
Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main).
This results in an error when attempting to download the files
automatically from Hugging Face. The artifact paths for both training
and validation need to be updated with the names as shown on Hugging
Face for proper downloading.

Signed-off-by: Robert Clark <roclark@nvidia.com>
The competitive coding resource config is missing a Hugging Face
identifier which prevents it from being downloaded via Hugging Face
using the data preparation tools.

Without the HF identifier run the following:

```
config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml"
ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface
```

This will throw a warning:

```
Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend
```

And eventually this error:

```
Traceback (most recent call last):
  File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module>
    sys.exit(prepare_data())
             ^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data
    data_processor.run(global_config_dict)
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run
    dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics
    state = self._validate_samples_and_aggregate_metrics_single_dataset(d)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset
    for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)):
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines
    with open(dataset_config.jsonl_fpath) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl'
```

This fix will download the validation file as intended and resolve the
errors.

Signed-off-by: Robert Clark <roclark@nvidia.com>
The train and val data paths are swapped in the config. This PR updates
them.

---------

Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com>
Co-authored-by: Test User <test@example.com>
# PR: Add ns_tools Resources Server

## Description

Adds a new resources server that integrates NeMo Skills tools (e.g.,
stateful Python code execution) with NeMo Gym's verification system.

**Key features:**
- Executes NeMo Skills tools via the ToolManager (e.g.,
`stateful_python_code_exec`)
- Delegates verification to other resources servers (e.g.,
`math_with_judge`)

## Verifier Delegation

The `ns_tools` server acts as a pass-through for verification. When
`verify()` is called, it delegates to the configured verifier (default:
`math_with_judge`):

```
ns_tools.verify(request) 
    → POST to math_with_judge/verify
    → returns reward from math_with_judge
```

This allows using NeMo Skills tools while leveraging existing
verification infrastructure.

## Example Data Format

```json
{
  "id": "aime25-0",
  "question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.",
  "expected_answer": "70",
  "verifier_type": "math_with_judge",
  "agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"},
  "responses_create_params": {
    "input": [
      {"role": "user", "content": "Solve the following math problem..."}
    ],
    "tools": [{
      "type": "function",
      "name": "stateful_python_code_exec",
      "description": "Execute Python code in a stateful environment.",
      "parameters": {
        "type": "object",
        "properties": {"code": {"type": "string"}},
        "required": ["code"]
      }
    }]
  }
}
```

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>
## Summary

- Adds new `math_formal_lean` resource server for Lean4 formal theorem
proving
- Implements `/verify` endpoint that compiles proofs via sandbox
container and returns reward 1.0/0.0
- Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned
prompt format
- Comprehensive test suite (31 tests)

## Components

| File | Description |
|------|-------------|
| `app.py` | Resource server with verify endpoint |
| `sandbox_client.py` | HTTP client for Lean4 sandbox |
| `proof_utils.py` | Proof extraction/building utilities |
| `prepare_minif2f.py` | Dataset preparation script |
| `README.md` | Documentation with licensing info |

## Test plan

- [x] Unit tests pass (31/31)
- [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5
samples)
- [x] Tested with gpt-5.1-codex-max model
- [x] Pre-commit lint checks pass

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Per title. This PR retains the current default of returning transitions,
but it is reasonable to change that default to match the other Gym
agents.

Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>
Refactoring the equivalency llm judge resource server into another
judge-based resource server. Main changes include removing regex logic
and cleaning up related configs to that.

Train data for this environment is still TBD, but a working version:
Data source: Sliced terminus prompts from different sources
train_jsonl_fpath:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl`
validation_jsonl_fpath:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl`
example train config:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml`

Example of env validation:

base model: early sft checkpoint of nano v3
(`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`)
Step 50 -> 21.25% on Terminal Bench Core
https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi


Next steps:
Will expand this PR with configurable verification options including
string matching, string similarity and openapi-based output schema
validation.

---------

Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Added new doc directories/article stubs for the topics identified in
0.2.0 IA. generated initial pass of structure and some starter content.
This will enable contributors to focus more on the topic itself rather
than the site build/toctree elements. **Feel free to blow away any
initial content in these pages**.

All stubbed pages have been marked with 🟡 in the toctree for easy
discovery. remove 🟡 once the page is finished.

<img width="1800" height="1009" alt="image"
src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db"
/>

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Added a complete example of preparing a custom dataset for usage with
NeMo Gym. The tutorial walks through downloading a dataset from Hugging
Face or modifying from a different source, adding the
"responses_create_params" field, writing a new resource server config,
and preparing the data with "ng_prepare_data". This tutorial can be used
as a guide for taking most arbitrary text-based datasets and modifying
them to a format that is compatible with NeMo Gym for post-training.

Signed-off-by: Robert Clark <roclark@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jan 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
@bxyu-nvidia bxyu-nvidia marked this pull request as ready for review February 5, 2026 00:25
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
@vadam5 vadam5 closed this Mar 10, 2026
@vadam5 vadam5 force-pushed the sdd/oh-metric-block-commands branch from bafcbb3 to 35241e7 Compare March 10, 2026 23:08
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.