-
Notifications
You must be signed in to change notification settings - Fork 60
ecosystem pg verbiage update #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
6fa4689 to
71f0756
Compare
Signed-off-by: Lawrence Lane <[email protected]>
| |-----------|--------|-------------| | ||
| | [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | ✅ Supported | NVIDIA's scalable post-training library with GRPO, DPO, SFT | | ||
| | [Unsloth](https://github.com/unslothai/unsloth) | ✅ Supported | Fast fine-tuning framework with memory optimization | | ||
| | [veRL](https://github.com/volcengine/verl) | 🔜 In Progress | Volcano Engine's scalable RL framework | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think verl is in progres but maybe someone is working on it?
and i think we can change TRL to say supported now, we are just fixing a minor last minute change, and working on additional docs e.g. sample reward/step or a potential blog post.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's been in progress for a while but paused, we need to resume this effort and support in the next release
| | Library | Status | Description | | ||
| |---------|--------|-------------| | ||
| | [reasoning-gym](https://github.com/open-thought/reasoning-gym) | ✅ Supported | Procedurally generated reasoning tasks (see `reasoning_gym` resource server) | | ||
| | [Aviary](https://github.com/Future-House/aviary) | ✅ Supported | Multi-environment framework for tool-using agents (see `aviary` resource server) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be worth saying its openai gymnasium compatible (but we should double confirm that statement)
Prime intellect - the library is named verifiers, or environments hub, not prime intelelct itself, imo
browsergym - not sure if anyone is working on this? @cwing-nvidia ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to verifiers in latest commits. browser gym integration is being worked on by Marc Cuevas
| | Name | Demonstrates | Config | README | | ||
| | ------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | | ||
| | Multi Step | Multi-step tool calling | <a href='resources_servers/example_multi_step/configs/example_multi_step.yaml'>example_multi_step.yaml</a> | <a href='resources_servers/example_multi_step/README.md'>README</a> | | ||
| | Reasoning Gym | External environment library integration | <a href='resources_servers/reasoning_gym/configs/reasoning_gym.yaml'>reasoning_gym.yaml</a> | <a href='resources_servers/reasoning_gym/README.md'>README</a> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought these dont go in readme because they dont have hf dataset link, i thought this readme table was built automatically based on that somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want all environments to be discoverable from the README
| | Resource Server | Domain | Dataset | Description | Value | Config | Train | Validation | License | | ||
| | -------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | ----- | ---------- | --------------------------------------------------------- | | ||
| | Aviary (GSM8K) | agent | <a href='https://arxiv.org/abs/2110.14168'>GSM8K</a> | Grade school math with calculator tool via Aviary integration | Improve math reasoning with tool use | <a href='resources_servers/aviary/configs/gsm8k_aviary.yaml'>config</a> | ✓ | - | MIT | | ||
| | Aviary (HotPotQA) | agent | <a href='https://aclanthology.org/D18-1259/'>HotPotQA</a> | Multi-hop question answering via Aviary integration | Improve multi-hop reasoning capabilities | <a href='resources_servers/aviary/configs/hotpotqa_aviary.yaml'>config</a> | ✓ | - | Apache 2.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we starting to enumerate multiple datasets / env implementation in the readme now too? we should do same for math for example too then? @bxyu-nvidia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be consistent
docs/about/ecosystem.md
Outdated
| } | ||
| ``` | ||
|
|
||
| Any framework that can read this format can use NeMo Gym rollouts—no native integration required. The following frameworks have documented patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its more complex than this. dont we already have a training fw integration guide with varios requirements? e.g. async openai compatible, retokenization correction, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. I removed this in later comments and linked to the guide
Simplify intro to emphasize goal of supporting broad set of RL training frameworks and environment libraries. Add contribution invite with link to issue template. Remove unnecessary tip box for new users. Signed-off-by: Chris Wing <[email protected]>
- Move model recipes (Nemotron Nano, Super) to new docs/model-recipes/ section - Simplify training framework integrations list in ecosystem page - Rename "Unsloth Training" to "Unsloth" for consistency - Update toctree to add Model Recipes section after Training Tutorials Signed-off-by: Chris Wing <[email protected]>
- Remove verl.md and nemo-customizer.md pages (not ready yet) - Reorganize training-tutorials/index.md with cleaner card layout - Add OpenRLHF card linking to external integration - Mark VeRL and NeMo Customizer as "Coming soon" with in-progress badges - Remove card descriptions for consistency, add SFT & DPO section - Reorder cards to match ecosystem page Signed-off-by: Chris Wing <[email protected]>
- Rename page to "Agentic RL Ecosystem" - Simplify training framework list with specific tutorial descriptions - Condense environment library integrations with README link - Reframe NeMo libraries section as "related tools for your workflow" - Remove redundant sections (community, building custom environments) - Update RL framework integration guide to link to training tutorials index Signed-off-by: Chris Wing <[email protected]>
- Move nemo-rl-grpo/, unsloth-training.md, offline-training-w-rollouts.md from tutorials/ to training-tutorials/ - Update all cross-references across docs Signed-off-by: Chris Wing <[email protected]>
| - **{doc}`NeMo RL <../training-tutorials/nemo-rl-grpo/index>`** - GRPO training to improve multi-step tool calling on the Workplace Assistant environment | ||
| - **[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF/blob/main/examples/python/agent_func_nemogym_executor.py)** - example agent executor for RL training | ||
| - **{doc}`TRL <../training-tutorials/trl>`** - GRPO training on Workplace Assistant and Reasoning Gym environments | ||
| - **{doc}`Unsloth <../training-tutorials/unsloth-training>`** - GRPO training on Sudoku environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also currently a multi-environment notebook that does instruction following and reasoning gym https://github.com/unslothai/notebooks/blob/main/nb/NeMo-Gym-Multi-Environment.ipynb if you want to mention. fine with me either way
|
|
||
| Depending on your workflow, you may also find these libraries useful: | ||
|
|
||
| | Library | Purpose | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would add nemo skills as there is some work ongoing between the two
| **Common Requirements**: | ||
|
|
||
| - NeMo RL v0.4.0+ installed ([setup instructions](../tutorials/nemo-rl-grpo/setup)) | ||
| - NeMo RL v0.4.0+ installed ([setup instructions](../training-tutorials/nemo-rl-grpo/setup)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v5 is released right, should we update to that?
| | DPO | ✅ Stable | 🔜 Planned | | ||
| | ORPO | ✅ Stable | 🔜 Planned | | ||
| | GRPO | ❌ Not in TRL | ✅ Use {doc}`NeMo RL <../tutorials/nemo-rl-grpo/index>` | | ||
| | GRPO | ❌ Not in TRL | ✅ Use {doc}`NeMo RL <nemo-rl-grpo/index>` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? Actually, the only algorithm that will be supported in TRL / NeMo Gym is GRPO - I dont think PPO DPO etc will work as of now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm i guess i see that this is the old docs stub and the other TRL docs PR will overwrite this, nvm
No description provided.