-
Notifications
You must be signed in to change notification settings - Fork 60
docs: aviary, verifiers, reasoning gym env integration docs #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
53ab5f4
5c3027c
eb05659
6394335
546c736
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| (environment-aviary)= | ||
|
|
||
| # Aviary | ||
|
|
||
| Integration with [Future-House/aviary](https://github.com/Future-House/aviary), a gymnasium for defining custom language agent RL environments. | ||
|
|
||
| Aviary is a framework for building custom RL environments with tool use and multi-step reasoning. Environments built in Aviary can be ran through NeMo Gym for training and inference. The library features pre-existing environments on math, general knowledge, biological sequences, scientific literature search, and protein stability. | ||
|
|
||
| --- | ||
|
|
||
| ## Available Environments | ||
|
|
||
| The integration includes several pre-built Aviary environments: | ||
|
|
||
| - **GSM8K** (`gsm8k_app.py`) - Grade school math problems with calculator tool | ||
| - **HotPotQA** (`hotpotqa_app.py`) - Multi-hop question answering | ||
| - **BixBench** (`notebook_app.py`) - Jupyter notebook execution for scientific tasks | ||
| - **Client/Proxy** (`client_app.py`) - Generic interface to remote Aviary dataset servers | ||
|
|
||
| --- | ||
|
|
||
| ## Example Usage | ||
|
|
||
| ### GSM8K Environment | ||
|
|
||
| Run the GSM8K Aviary resources server with a model config: | ||
|
|
||
| ```bash | ||
| ng_run "+config_paths=[resources_servers/aviary/configs/gsm8k_aviary.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]" | ||
| ``` | ||
|
|
||
| Collect rollouts: | ||
|
|
||
| ```bash | ||
| ng_collect_rollouts \ | ||
| +agent_name=gsm8k_aviary_agent \ | ||
| +input_jsonl_fpath=resources_servers/aviary/data/example.jsonl \ | ||
| +output_jsonl_fpath=resources_servers/aviary/data/example_rollouts.jsonl | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Reference | ||
|
|
||
| - [Aviary GitHub](https://github.com/Future-House/aviary) - Official Aviary repository | ||
| - [Aviary Paper](https://arxiv.org/abs/2412.21154) - Training language agents on challenging scientific tasks | ||
| - `resources_servers/aviary/` - NeMo Gym resources server implementations | ||
| - `responses_api_agents/aviary_agent/` - NeMo Gym aviary agent integration | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| (environment-reasoning-gym)= | ||
|
|
||
| # Reasoning Gym | ||
|
|
||
| Integration with [open-thought/reasoning-gym](https://github.com/open-thought/reasoning-gym), a library of procedural dataset generators and algorithmically verifiable reasoning environments. | ||
|
|
||
| Reasoning Gym provides 100+ tasks over many domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and common games. Tasks are procedurally generated with adjustable complexity and algorithmically verified. | ||
|
|
||
| --- | ||
|
|
||
| ## Dataset Preparation | ||
|
|
||
| The integration includes a helper script for creating datasets from reasoning gym tasks. | ||
|
|
||
| **Single task:** | ||
| ```bash | ||
| python resources_servers/reasoning_gym/scripts/create_dataset.py \ | ||
| --task knights_knaves \ | ||
| --size 500 \ | ||
| --seed 42 \ | ||
| --output resources_servers/reasoning_gym/data/train_knights_knaves.jsonl | ||
| ``` | ||
|
|
||
| **Multiple tasks (composite):** | ||
| ```bash | ||
| python resources_servers/reasoning_gym/scripts/create_dataset.py \ | ||
| --tasks knights_knaves,syllogisms,leg_counting \ | ||
| --size 1000 \ | ||
| --output resources_servers/reasoning_gym/data/train_composite.jsonl | ||
| ``` | ||
|
|
||
| **All tasks in a category:** | ||
| ```bash | ||
| python resources_servers/reasoning_gym/scripts/create_dataset.py \ | ||
| --category logic \ | ||
| --size 1000 \ | ||
| --output resources_servers/reasoning_gym/data/train_logic.jsonl | ||
| ``` | ||
|
|
||
| **All available tasks:** | ||
| ```bash | ||
| python resources_servers/reasoning_gym/scripts/create_dataset.py \ | ||
| --all-tasks \ | ||
| --size 1000 \ | ||
| --output resources_servers/reasoning_gym/data/train_all.jsonl | ||
| ``` | ||
|
|
||
| **With custom task configuration:** | ||
| ```bash | ||
| python resources_servers/reasoning_gym/scripts/create_dataset.py \ | ||
| --task knights_knaves \ | ||
| --size 500 \ | ||
| --config '{"n_people": 3, "depth_constraint": 3}' \ | ||
| --output resources_servers/reasoning_gym/data/train_hard.jsonl | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Rollout Collection | ||
|
|
||
| ### Start vLLM Server | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we follow the pattern where we are using a hosted model to generate rollouts like the quickstart?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unnecessary burden to get a model and serve it with vLLM, right? |
||
|
|
||
| ```bash | ||
| pip install -U "vllm>=0.12.0" | ||
|
|
||
| wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py | ||
|
|
||
| vllm serve nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \ | ||
| --max-num-seqs 8 \ | ||
| --tensor-parallel-size 1 \ | ||
| --max-model-len 262144 \ | ||
| --port 10240 \ | ||
| --trust-remote-code \ | ||
| --tool-call-parser qwen3_coder \ | ||
| --reasoning-parser-plugin nano_v3_reasoning_parser.py \ | ||
| --reasoning-parser nano_v3 | ||
| ``` | ||
|
|
||
| ### Create env.yaml | ||
|
|
||
| ```yaml | ||
| policy_base_url: http://localhost:10240/v1 | ||
| policy_api_key: EMPTY | ||
| policy_model_name: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | ||
| ``` | ||
|
|
||
| ### Launch NeMo Gym Servers | ||
|
|
||
| ```bash | ||
| ng_run "+config_paths=[resources_servers/reasoning_gym/configs/reasoning_gym.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]" | ||
| ``` | ||
|
|
||
| ### Collect Rollouts | ||
|
|
||
| ```bash | ||
| ng_collect_rollouts \ | ||
| +agent_name=reasoning_gym_simple_agent \ | ||
| +input_jsonl_fpath=resources_servers/reasoning_gym/data/example.jsonl \ | ||
| +output_jsonl_fpath=results/reasoning_gym_rollouts.jsonl \ | ||
| +limit=5 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Reference | ||
|
|
||
| - [Reasoning Gym GitHub](https://github.com/open-thought/reasoning-gym) | ||
| - [Dataset Gallery](https://github.com/open-thought/reasoning-gym/blob/main/GALLERY.md) - Examples of all available tasks | ||
| - `resources_servers/reasoning_gym/` - NeMo Gym integration implementation | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| (environment-verifiers)= | ||
|
|
||
| # Verifiers | ||
|
|
||
| Integration with [PrimeIntellect-ai/verifiers](https://github.com/PrimeIntellect-ai/verifiers), enabling environments from Prime Intellect's Environments Hub to run in NeMo Gym. | ||
|
|
||
| Verifiers provides 600+ environments across reasoning, math, and agent tasks. Environments built for Environments Hub can be deployed through NeMo Gym for training with NeMo RL. Unlike typical NeMo Gym environments, verifiers environments handle state management, verification, and tool execution internally without requiring a separate resource server. | ||
|
|
||
| :::{note} | ||
| **Multi-turn environments:** Currently require disabling `enforce_monotonicity` in training configuration until token propagation is fully patched. | ||
| ::: | ||
|
|
||
| --- | ||
|
|
||
| ## Install Dependencies | ||
|
|
||
| Install verifiers and prime tools: | ||
|
|
||
| ```bash | ||
| # From the Gym repository root | ||
| uv venv | ||
| source .venv/bin/activate | ||
| uv sync | ||
| uv add verifiers | ||
| uv tool install prime | ||
| ``` | ||
|
|
||
| Install an environment: | ||
|
|
||
| ```bash | ||
| prime env install primeintellect/acereason-math | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Create Dataset | ||
|
|
||
| Generate example tasks: | ||
|
|
||
| ```bash | ||
| python3 responses_api_agents/verifiers_agent/scripts/create_dataset.py \ | ||
| --env-id primeintellect/acereason-math \ | ||
| --size 5 \ | ||
| --output responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Update Agent Requirements | ||
|
|
||
| Add to `responses_api_agents/verifiers_agent/requirements.txt`: | ||
|
|
||
| ```txt | ||
| -e nemo-gym[dev] @ ../../ | ||
| verifiers>=0.1.9 | ||
| --extra-index-url https://hub.primeintellect.ai/primeintellect/simple/ | ||
| acereason-math | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Configure Model Server | ||
|
|
||
| Create `env.yaml` at repository root: | ||
|
|
||
| ```yaml | ||
| policy_base_url: "http://localhost:8000/v1" | ||
| policy_api_key: "dummy" | ||
| policy_model_name: "Qwen/Qwen3-4B-Instruct-2507" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Start Model Server | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar comment to the other env tutorials about a hosted model raising barrier to entry and consistency with quickstart. Also this one doesn't have the instruction to pull weights from HF |
||
|
|
||
| ```bash | ||
| uv add vllm | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are we going with pip or uv? reasoning env has pip install https://github.com/NVIDIA-NeMo/Gym/pull/617/changes#diff-ada604f88b18e8dbff44f513c28f5aad984dc5e3bbbd213d4c1aadd9214350f9R64 |
||
| vllm serve Qwen/Qwen3-4B-Instruct-2507 \ | ||
| --max-model-len 32768 \ | ||
| --reasoning-parser qwen3 \ | ||
| --enable-auto-tool-choice \ | ||
| --tool-call-parser hermes | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Launch NeMo Gym Servers | ||
|
|
||
| ```bash | ||
| ng_run "+config_paths=[responses_api_agents/verifiers_agent/configs/verifiers_acereason-math.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Collect Rollouts | ||
|
|
||
| ```bash | ||
| ng_collect_rollouts \ | ||
| +agent_name=verifiers_agent \ | ||
| +input_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl \ | ||
| +output_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example-rollouts.jsonl \ | ||
| +limit=5 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Reference | ||
|
|
||
| - [Prime Intellect Environments Hub](https://app.primeintellect.ai/dashboard/environments) - Browse 600+ available environments | ||
| - [Verifiers GitHub](https://github.com/PrimeIntellect-ai/verifiers) - Verifiers library | ||
| - `responses_api_agents/verifiers_agent/` - NeMo Gym agent integration | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then unlike my comment here on Reasoning gym https://github.com/NVIDIA-NeMo/Gym/pull/617/changes#r2800137030 we do not have the "setup steps" before running
ng_runNeed one pattern and to follow it