OH metrics and block commands#598
Closed
sdevare-nv wants to merge 230 commits intomainfrom
Closed
Conversation
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Add copy-pr-bot
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Add initial repo template
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Migrated over from gitlab: - Display aggregate metrics - Aggregate generic keys using multineedle - Display other dynamic aggregations - Count string totals and unique values - Remove TrainDataProcessor dependency, add test - Remove dupe file read, fix arg types hints --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>
updated the following logging print when running ng_prepare_data from, for example: "Found 0 agent server instance configs withOUT datasets:" to "Found 0 agent server instance configs WITHOUT datasets:" to match the format of the subsequent logs, for example: "Found 1 agent server instance configs WITH datasets:" Signed-off-by: chrismun <cmunley@nvidia.com>
update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>
This PR enables running Gym on Aviary environments. The two main concepts: - `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and manages multiple environments - Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs, but an integer index into the `TaskDataset`. Otherwise we'd have data defined in two places - Instead of tool-specific endpoints, we have one `/step` endpoint. This is because: - Aviary environments define their transition function in `step()`. Simply calling the bare tools can have undefined behavior (e.g. state isn't updated properly) - Aviary tools are not guaranteed to be available until `reset()` is called. - A `/close` endpoint is added to tear down resources - `AviaryAgent`: analogous to `SimpleAgent`, but: - Request is an integer index (which is forwarded to `AviaryResourcesServer`). In general, we expect `env.reset()` to provide the first messages, not the calling code - All tool calls are sent to `/step` - We rely on the environment to tell us when we're done Two concrete Aviary datasets/environments are integrated: GSM8k with a calculator environment and BixBench with a notebook environment. Adding new ones is pretty lightweight (most of the code in `notebook_app.py` is from defining a BixBench-compatible environment, not the integration). --------- Signed-off-by: Siddharth Narayanan <sid@futurehouse.org> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com> Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Adds more descriptive readme, reward profiling, and option for fractional or binary reward. Signed-off-by: abukharin-nv <abukharin@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>
This PR adds new environments for SWE tasks. The environments can be used for single-step patch generation, test generation, and LLM-as-a-judge. They have been tested for instances from SWE-bench, SWE-Gym, and SWE-rebench. Patch and test generation environment runs them against unittests in a containerized environment (Singularity). --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com>
Integrating a new dataset using existing equivalency llm judge resource server. Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash License: https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE Train: 8040 unique samples Validation: 50 unique, randomly sampled from train Augmentation on the source (minimal): Added system prompt, output formatting requirement Example of env validation: - base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint) - Step 30 -> 12.50% on Terminal Bench Core - https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm Train: nl2bash-super-train-0901.jsonl Validation: nl2bash-super-validation-0901.jsonl https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/ ``` ng_download_dataset_from_gitlab \ +dataset_name=nl2bash-equivalency-judge \ +version=0.0.1 \ +artifact_fpath=nl2bash-super-train-0901.jsonl \ +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl ``` --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
# Make `agent_name` optional in CLI rollout collection ## Summary Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to use `agent_ref` from each data row instead. ## Motivation The NeMo-RL training code already respects per-row `agent_ref`, but the Gym CLI (`ng_collect_rollouts`) required a single hardcoded `agent_name`. This prevented multi-agent rollout collection via CLI. ## Changes - `rollout_collection.py`: Made `agent_name` field optional with `default=None` - Use `config.agent_name` if specified; otherwise fall back to `row["agent_ref"]["name"]` - Added validation error if neither source provides an agent name ## Behavior | Before | After | |--------|-------| | `+agent_name=...` required | `+agent_name=...` optional | | All rows use same agent | Rows can use different agents via `agent_ref` | --------- Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Inspired by https://github.com/NVIDIA-NeMo/Gym/pull/318/files#diff-b56c7f31b7793b3a4ac265f84f4c84216f1ed15a3fbee855da9674a7da8714ff by @pjin-nvidia --------- Signed-off-by: Brian Yu <bxyu@nvidia.com>
The default artifact paths for the math_with_judge resource server doesn't match the filenames for the provided dataset (nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main). This results in an error when attempting to download the files automatically from Hugging Face. The artifact paths for both training and validation need to be updated with the names as shown on Hugging Face for proper downloading. Signed-off-by: Robert Clark <roclark@nvidia.com>
The competitive coding resource config is missing a Hugging Face
identifier which prevents it from being downloaded via Hugging Face
using the data preparation tools.
Without the HF identifier run the following:
```
config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml"
ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface
```
This will throw a warning:
```
Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend
```
And eventually this error:
```
Traceback (most recent call last):
File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module>
sys.exit(prepare_data())
^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data
data_processor.run(global_config_dict)
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run
dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics
state = self._validate_samples_and_aggregate_metrics_single_dataset(d)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset
for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines
with open(dataset_config.jsonl_fpath) as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl'
```
This fix will download the validation file as intended and resolve the
errors.
Signed-off-by: Robert Clark <roclark@nvidia.com>
The train and val data paths are swapped in the config. This PR updates them. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com>
# PR: Add ns_tools Resources Server
## Description
Adds a new resources server that integrates NeMo Skills tools (e.g.,
stateful Python code execution) with NeMo Gym's verification system.
**Key features:**
- Executes NeMo Skills tools via the ToolManager (e.g.,
`stateful_python_code_exec`)
- Delegates verification to other resources servers (e.g.,
`math_with_judge`)
## Verifier Delegation
The `ns_tools` server acts as a pass-through for verification. When
`verify()` is called, it delegates to the configured verifier (default:
`math_with_judge`):
```
ns_tools.verify(request)
→ POST to math_with_judge/verify
→ returns reward from math_with_judge
```
This allows using NeMo Skills tools while leveraging existing
verification infrastructure.
## Example Data Format
```json
{
"id": "aime25-0",
"question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.",
"expected_answer": "70",
"verifier_type": "math_with_judge",
"agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"},
"responses_create_params": {
"input": [
{"role": "user", "content": "Solve the following math problem..."}
],
"tools": [{
"type": "function",
"name": "stateful_python_code_exec",
"description": "Execute Python code in a stateful environment.",
"parameters": {
"type": "object",
"properties": {"code": {"type": "string"}},
"required": ["code"]
}
}]
}
}
```
---------
Signed-off-by: George Armstrong <georgea@nvidia.com>
## Summary - Adds new `math_formal_lean` resource server for Lean4 formal theorem proving - Implements `/verify` endpoint that compiles proofs via sandbox container and returns reward 1.0/0.0 - Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned prompt format - Comprehensive test suite (31 tests) ## Components | File | Description | |------|-------------| | `app.py` | Resource server with verify endpoint | | `sandbox_client.py` | HTTP client for Lean4 sandbox | | `proof_utils.py` | Proof extraction/building utilities | | `prepare_minif2f.py` | Dataset preparation script | | `README.md` | Documentation with licensing info | ## Test plan - [x] Unit tests pass (31/31) - [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5 samples) - [x] Tested with gpt-5.1-codex-max model - [x] Pre-commit lint checks pass 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Per title. This PR retains the current default of returning transitions, but it is reasonable to change that default to match the other Gym agents. Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>
Refactoring the equivalency llm judge resource server into another judge-based resource server. Main changes include removing regex logic and cleaning up related configs to that. Train data for this environment is still TBD, but a working version: Data source: Sliced terminus prompts from different sources train_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl` validation_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl` example train config: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml` Example of env validation: base model: early sft checkpoint of nano v3 (`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`) Step 50 -> 21.25% on Terminal Bench Core https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi Next steps: Will expand this PR with configurable verification options including string matching, string similarity and openapi-based output schema validation. --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Added new doc directories/article stubs for the topics identified in 0.2.0 IA. generated initial pass of structure and some starter content. This will enable contributors to focus more on the topic itself rather than the site build/toctree elements. **Feel free to blow away any initial content in these pages**. All stubbed pages have been marked with 🟡 in the toctree for easy discovery. remove 🟡 once the page is finished. <img width="1800" height="1009" alt="image" src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db" /> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com>
Added a complete example of preparing a custom dataset for usage with NeMo Gym. The tutorial walks through downloading a dataset from Hugging Face or modifying from a different source, adding the "responses_create_params" field, writing a new resource server config, and preparing the data with "ng_prepare_data". This tutorial can be used as a guide for taking most arbitrary text-based datasets and modifying them to a format that is compatible with NeMo Gym for post-training. Signed-off-by: Robert Clark <roclark@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
bafcbb3 to
35241e7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.