Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 9 additions & 12 deletions docs/about/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Model servers expose OpenAI-compatible inference endpoints for chat and response
- `POST /v1/chat/completions`
- `POST /v1/responses`

The base model server class defines these endpoints. Concrete model servers implement them (for example, the OpenAI-backed model server). Agents call these endpoints through the shared server client.
The base model server class defines these endpoints. Concrete model servers implement them (for example, the OpenAI or vLLM model server). Agents call these endpoints through the shared server client.

### Resources servers (environment + verification)

Expand All @@ -36,14 +36,15 @@ Resources servers expose environment lifecycle endpoints:

Individual resources servers can add domain-specific endpoints for tools or environment steps. For example:

- A resources server can register a catch-all tool route like `POST /{path}` for tool execution.
- Aviary-based resources servers add `POST /step` and `POST /close` for multi-step environments.
- Individual tools as `POST /get_weather` or `POST /search`
- A resources server can register a catch-all tool route like `POST /{path}` for dynamic environments.
- Supports `POST /step` and `POST /close` for Gymnasium-style environments .

### Agent servers (rollout orchestration)

Agent servers expose two primary endpoints:

- `POST /v1/responses` for multi-step interaction
- `POST /v1/responses` for individual generations
- `POST /run` for full rollout execution and verification

The base agent server class wires these routes, while each agent implementation defines how to call model and resources servers.
Expand All @@ -63,19 +64,15 @@ The shared server client fetches the resolved configuration from the head server

The `SimpleAgent` implementation orchestrates a complete rollout and verification sequence:

1. Call the resources server `POST /seed_session` to initialize session state.
2. Call the agent `POST /v1/responses`. The agent calls the model server `POST /v1/responses` and issues tool calls to the resources server via `POST /{tool_name}`.
3. Call the resources server `POST /verify` and return the verified rollout response.
1. Call the resources server `POST /seed_session` to initialize environment state.
2. Call the agent `POST /v1/responses`. The agent calls the model server `POST /v1/responses` and issues tool calls to the resources server via `POST /{tool_name}` to interact with the environment.
3. Call the resources server `POST /verify` and return the rollout and reward.

The rollout collection flow uses the agent `POST /run` endpoint and writes the returned metrics to JSONL output.

### Multi-step environments (Aviary example)

Some resources servers model environments with explicit step and close endpoints. Aviary-based resources servers accept `POST /step` for environment transitions and `POST /close` to release an environment instance.

## Session and State

All servers add session handling that assigns a session ID when one is not present. Agents propagate cookies between model and resources servers, which lets resources servers store per-session state. Several resources servers keep in-memory maps keyed by session ID (for example, counters or tool environments) to track environment state across steps.
All servers add session handling that assigns a session ID on initialization. Agents propagate cookies between model and resources servers, which lets resources servers store per-session state. Several resources servers keep in-memory maps keyed by session ID (for example, counters or tool environments) to track environment state across steps.

## Configuration and Port Resolution

Expand Down
15 changes: 14 additions & 1 deletion docs/contribute/rl-framework-integration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,22 @@ These guides cover how to integrate NeMo Gym into a new RL training framework. U
- Contributing NeMo Gym integration for a training framework that does not have one yet

:::{tip}
Just want to train models? Use {ref}`NeMo RL <training-nemo-rl-grpo-index>` instead.
Just want to train models? See existing integrations:
- {ref}`NeMo RL <training-nemo-rl-grpo-index>` - Multi-step and multi-turn RL training at scale
- {doc}`TRL (Hugging Face) <../training-tutorials/trl>` - GRPO training with distributed training support
- {doc}`Unsloth <../training-tutorials/unsloth>` - Fast, memory-efficient training for single-step tasks
:::

## Existing Integrations

NeMo Gym currently integrates with the following RL training frameworks:

**[NeMo RL](https://github.com/NVIDIA-NeMo/RL)**: NVIDIA's RL training framework, purpose-built for large-scale frontier model training. Provides full support for multi-step and multi-turn environments with production-grade distributed training capabilities.

**[TRL](https://github.com/huggingface/trl)**: Hugging Face's transformer reinforcement learning library. Supports GRPO with single and multi-turn NeMo Gym environments using vLLM generation, multi-environment training, and distributed training via Accelerate and DeepSpeed. See the {doc}`TRL tutorial <../training-tutorials/trl>` for usage examples.

**[Unsloth](https://github.com/unslothai/unsloth)**: Fast, memory-efficient fine-tuning library. Supports optimized GRPO with single-step NeMo Gym environments including low precision, parameter-efficient fine-tuning, and training in notebook environments. See the {doc}`Unsloth tutorial <../training-tutorials/unsloth>` for getting started.

## Prerequisites

Before integrating Gym into your training framework, ensure you have:
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,8 +418,8 @@ Rollout Collection <get-started/rollout-collection.md>
🟡 Nemotron Nano <training-tutorials/nemotron-nano>
🟡 Nemotron Super <training-tutorials/nemotron-super>
NeMo RL GRPO <tutorials/nemo-rl-grpo/index.md>
Unsloth Training <tutorials/unsloth-training>
🟡 TRL <training-tutorials/trl>
🟡 Unsloth <training-tutorials/unsloth>
🟡 VERL <training-tutorials/verl>
🟡 NeMo Customizer <training-tutorials/nemo-customizer>
Offline Training <tutorials/offline-training-w-rollouts>
Expand Down
Loading