:::{warning} Experimental Feature: The Finetuning Harness is experimental and may change in future releases. Future versions may introduce breaking changes without notice. :::
The NeMo Agent Toolkit provides a powerful finetuning harness designed for in-situ reinforcement learning of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users.
The finetuning harness is built on four foundational principles:
| Principle | Description |
|---|---|
| Decoupled Architecture | Training logic is separated from backends, allowing you to use any RL framework (OpenPipe ART, NeMo Aligner, custom implementations) |
| In-Situ Training | Train agents with the same workflow you run in production, without moving to a different development environment. |
| Flexible Targeting | Finetune specific functions or entire workflows, enabling targeted improvements in complex agentic systems. |
| Composable Components | Three pluggable components (TrajectoryBuilder, TrainerAdapter, Trainer) can be mixed, matched, and customized |
┌────────────────────────────────────────────────────────────────────────┐
│ Trainer │
│ (Orchestrates the finetuning loop across epochs) │
│ │
│ ┌───────────────────────┐ ┌───────────────────────────┐ │
│ │ TrajectoryBuilder │ │ TrainerAdapter │ │
│ │ │ │ │ │
│ │ - Runs evaluations │ ──────► │ - Validates trajectories │ │
│ │ - Collects episodes │ │ - Submits to backend │ │
│ │ - Computes rewards │ │ - Monitors training │ │
│ │ - Groups trajectories│ │ - Reports status │ │
│ └───────────────────────┘ └───────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────┐
│ Remote Training │
│ Backend │
└─────────────────────────┘
| Guide | Description |
|---|---|
| Concepts | Core concepts, RL fundamentals, curriculum learning, and architecture details |
| Extending | How to implement custom TrajectoryBuilders, TrainerAdapters, and Trainers |
| OpenPipe ART | Using the OpenPipe ART backend for GRPO training |
| Backend | Plugin Package | Description |
|---|---|---|
| OpenPipe ART | nvidia-nat-openpipe-art |
GRPO-based training with vLLM and TorchTune |
- Curriculum Learning: Progressively introduce harder examples during training
- Multi-Generation Trajectories: Collect multiple responses per example for GRPO optimization
- Validation Monitoring: Periodic evaluation on held-out data to track generalization
- Progress Visualization: Automatic reward plots and metrics logging
- Flexible Targeting: Train specific functions or models in complex workflows
- Training backend (e.g., OpenPipe ART server with GPU)
- LLM inference endpoint with log probability support
- Training dataset in JSON/JSONL format
- Custom evaluator for computing rewards
:hidden:
:caption: Finetuning
Concepts <./concepts.md>
OpenPipe ART <./rl_with_openpipe.md>
DPO With NeMo Customizer <./dpo_with_nemo_customizer.md>