NVIDIA NeMo Agent Toolkit Finetuning Harness for Reinforcement Learning

:::{warning} Experimental Feature: The Finetuning Harness is experimental and may change in future releases. Future versions may introduce breaking changes without notice. :::

The NeMo Agent Toolkit provides a powerful finetuning harness designed for in-situ reinforcement learning of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users.

Overview

The finetuning harness is built on four foundational principles:

Principle	Description
Decoupled Architecture	Training logic is separated from backends, allowing you to use any RL framework (OpenPipe ART, NeMo Aligner, custom implementations)
In-Situ Training	Train agents with the same workflow you run in production, without moving to a different development environment.
Flexible Targeting	Finetune specific functions or entire workflows, enabling targeted improvements in complex agentic systems.
Composable Components	Three pluggable components (TrajectoryBuilder, TrainerAdapter, Trainer) can be mixed, matched, and customized

Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                              Trainer                                   │
│  (Orchestrates the finetuning loop across epochs)                      │
│                                                                        │
│  ┌───────────────────────┐         ┌───────────────────────────┐       │
│  │  TrajectoryBuilder    │         │    TrainerAdapter         │       │
│  │                       │         │                           │       │
│  │  - Runs evaluations   │ ──────► │  - Validates trajectories │       │
│  │  - Collects episodes  │         │  - Submits to backend     │       │
│  │  - Computes rewards   │         │  - Monitors training      │       │
│  │  - Groups trajectories│         │  - Reports status         │       │
│  └───────────────────────┘         └───────────────────────────┘       │
└────────────────────────────────────────────────────────────────────────┘
                                         │
                                         ▼
                            ┌─────────────────────────┐
                            │   Remote Training       │
                            │      Backend            │
                            └─────────────────────────┘

Documentation

Guide	Description
Concepts	Core concepts, RL fundamentals, curriculum learning, and architecture details
Extending	How to implement custom TrajectoryBuilders, TrainerAdapters, and Trainers
OpenPipe ART	Using the OpenPipe ART backend for GRPO training

Supported Backends

Backend	Plugin Package	Description
OpenPipe ART	`nvidia-nat-openpipe-art`	GRPO-based training with vLLM and TorchTune

Key Features

Curriculum Learning: Progressively introduce harder examples during training
Multi-Generation Trajectories: Collect multiple responses per example for GRPO optimization
Validation Monitoring: Periodic evaluation on held-out data to track generalization
Progress Visualization: Automatic reward plots and metrics logging
Flexible Targeting: Train specific functions or models in complex workflows

Requirements

Training backend (e.g., OpenPipe ART server with GPU)
LLM inference endpoint with log probability support
Training dataset in JSON/JSONL format
Custom evaluator for computing rewards

:hidden:
:caption: Finetuning

Concepts <./concepts.md>
OpenPipe ART <./rl_with_openpipe.md>
DPO With NeMo Customizer <./dpo_with_nemo_customizer.md>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA NeMo Agent Toolkit Finetuning Harness for Reinforcement Learning

Overview

Architecture

Documentation

Supported Backends

Key Features

Requirements

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

NVIDIA NeMo Agent Toolkit Finetuning Harness for Reinforcement Learning

Overview

Architecture

Documentation

Supported Backends

Key Features

Requirements