diff --git a/README.md b/README.md index 162ca6b80..41eafe13f 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ NeMo Gym is a library for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework. -NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA’s GPU-accelerated platform for building and training generative AI models. +NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/). For details on how NeMo Gym fits within the NeMo ecosystem and integrates with other RL frameworks, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) documentation. ## πŸ† Why NeMo Gym? @@ -16,6 +16,34 @@ NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/n > [!IMPORTANT] > NeMo Gym is currently in early development. You should expect evolving APIs, incomplete documentation, and occasional bugs. We welcome contributions and feedback - for any changes, please open an issue first to kick off discussion! +## πŸ”— Ecosystem Integrations + +NeMo Gym is designed to integrate seamlessly with the broader RL ecosystem. For detailed documentation, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) page. + +### Training Frameworks + +NeMo Gym provides rollout collection infrastructure that integrates with various RL training frameworks: + +| Framework | Status | Description | +|-----------|--------|-------------| +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | βœ… Supported | NVIDIA's scalable post-training library with GRPO, DPO, SFT | +| [Unsloth](https://github.com/unslothai/unsloth) | βœ… Supported | Fast fine-tuning framework with memory optimization | +| [TRL](https://github.com/huggingface/trl) | βœ… Supported | Hugging Face Transformer Reinforcement Learning | +| [veRL](https://github.com/volcengine/verl) | πŸ”œ In Progress | Volcano Engine's scalable RL framework | + +### Environment Libraries + +NeMo Gym integrates with environment libraries for diverse training scenarios. All integrations are compatible with OpenAI Gymnasium standards. + +| Library | Status | Description | +|---------|--------|-------------| +| [reasoning-gym](https://github.com/open-thought/reasoning-gym) | βœ… Supported | Procedurally generated reasoning tasks (see `reasoning_gym` resource server) | +| [Aviary](https://github.com/Future-House/aviary) | βœ… Supported | Multi-environment framework for tool-using agents (see `aviary` resource server) | +| [Verifiers](https://github.com/PrimeIntellect-ai/verifiers) | πŸ”œ In Progress | Environment hub for coding, data & ML, science & reasoning, tool use and more | +| [BrowserGym](https://github.com/ServiceNow/BrowserGym) | πŸ”œ In Progress | Web browsing and automation environments | + +> πŸ’‘ **Want to add an integration?** We welcome contributions! See our [Contributing Guide](https://docs.nvidia.com/nemo/gym/latest/contribute/index.html) or [open an issue](https://github.com/NVIDIA-NeMo/Gym/issues) to discuss. + ## πŸ“‹ Requirements ### Hardware Requirements diff --git a/docs/about/ecosystem.md b/docs/about/ecosystem.md index 6abf9dd3a..6e2542e27 100644 --- a/docs/about/ecosystem.md +++ b/docs/about/ecosystem.md @@ -1,27 +1,51 @@ (about-ecosystem)= -# NeMo Gym in the NVIDIA Ecosystem +# Agentic RL Ecosystem -NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA's GPU-accelerated platform for building and training generative AI models. +We're building NeMo Gym to integrate with a broad set of RL training frameworks and environment libraries. -:::{tip} -For details on NeMo Gym capabilities, refer to the -{ref}`Overview `. -::: +We would love your contribution! Open a PR to add an integration, or [file an issue](https://github.com/NVIDIA-NeMo/Gym/issues/new/choose) to share what would be valuable for you. --- -## NeMo Gym Within the NeMo Framework +## Training Framework Integrations -NeMo Framework includes modular libraries for end-to-end model training: +- **{doc}`NeMo RL <../tutorials/nemo-rl-grpo/index>`** - GRPO training to improve multi-step tool calling on the Workplace Assistant environment +- **[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF/blob/main/examples/python/agent_func_nemogym_executor.py)** - example agent executor for RL training +- **{doc}`TRL <../training-tutorials/trl>`** - GRPO training on Workplace Assistant and Reasoning Gym environments +- **{doc}`Unsloth <../tutorials/unsloth-training>`** - GRPO training on Sudoku environment, with [multi-environment notebook](https://github.com/unslothai/notebooks/blob/main/nb/NeMo-Gym-Multi-Environment.ipynb) for instruction following and reasoning gym +- **NeMo Customizer** - *(In progress)* +- **VeRL** - *(In progress)* -* **[NeMo Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge)**: Pretraining and fine-tuning with Megatron-Core -* **[NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel)**: PyTorch native training for Hugging Face models -* **[NeMo RL](https://github.com/NVIDIA-NeMo/RL)**: Scalable and efficient post-training -* **[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)**: RL environment infrastructure and rollout collection (this project) -* **[NeMo Curator](https://github.com/NVIDIA-NeMo/Curator)**: Data preprocessing and curation -* **[NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner)**: Synthetic data generation from scratch or seed datasets -* **[NeMo Evaluator](https://github.com/NVIDIA-NeMo/Evaluator)**: Model evaluation and benchmarking -* **[NeMo Guardrails](https://github.com/NVIDIA-NeMo/Guardrails)**: Programmable safety guardrails -* And more... +To integrate another training framework, see the {doc}`Training Framework Integration Guide <../contribute/rl-framework-integration/index>`. -**NeMo Gym's Role**: Within this ecosystem, Gym focuses on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic. This makes it practical to generate large-scale, high-quality training data for NeMo RL and other training frameworks. +--- + +## Environment Library Integrations + +NeMo Gym integrates with external environment libraries and benchmarks. See the [README](https://github.com/NVIDIA-NeMo/Gym?tab=readme-ov-file#table-2-resource-servers-for-training) for the full listβ€”here are a few examples: + +- **[Reasoning Gym](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/reasoning_gym)** - reasoning environments spanning computation, cognition, logic and more +- **[Aviary](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/aviary)** - environments spanning math, knowledge, biological sequences, scientific literature search, and protein stability +- **[Verifiers](https://github.com/PrimeIntellect-ai/verifiers)** - *(In progress)* - environment hub for coding, data & ML, science & reasoning, tool use and more +- **[BrowserGym](https://github.com/ServiceNow/BrowserGym)** - *(In progress)* - environments for web task automation + + +--- + +## Related NeMo Libraries + +NeMo Gym is a component of NVIDIA NeMo, a GPU-accelerated platform for building and training generative AI models. + +Depending on your workflow, you may also find these libraries useful: + +| Library | Purpose | +|---------|---------| +| [NeMo Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Pretraining and fine-tuning with Megatron-Core | +| [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | PyTorch native training for Hugging Face models | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Scalable post-training with GRPO, DPO, and SFT | +| **[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)** | RL environment infrastructure and rollout collection *(this project)* | +| [NeMo Curator](https://github.com/NVIDIA-NeMo/Curator) | Data preprocessing and curation | +| [NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Synthetic data generation | +| [NeMo Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Model evaluation and benchmarking | +| [NeMo Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Programmable safety guardrails | +| [NeMo Skills](https://github.com/NVIDIA-NeMo/NeMo-Skills) | Skills framework for code generation and reasoning | \ No newline at end of file