SkyRL: A Modular Full-stack RL Library for LLMs

News • Links • Getting Started • Citation • Acknowledgement

Overview of this fork

This is a fork of SkyRL for the OpenThoughts-Agent project.

We will soon merge the changes to the main SkyRL branch.

For the time being, we list the steps to run SkyRL+Harbor for reproducing the RL training of our first release, i.e.:

Using open-thoughts/OpenThinker-Agent-v1-SFT as base
GRPO with the data open-thoughts/OpenThoughts-Agent-v1-RL, while
Evaluating with open-thoughts/OpenThoughts-TB-dev, and
Getting the final open-thoughts/OpenThinker-Agent-v1

Environment

Install SkyRL

conda create -n otagent python=3.12
conda activate otagent
pip install --index-url https://download.pytorch.org/whl/cu128 torch==2.7.1 torchvision
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.7cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

git clone https://github.com/mlfoundations/SkyRL
cd SkyRL/skyrl-train/
pip install -e .
pip install "vllm==0.10.1.1"
cd ../..

Install Harbor

git clone https://github.com/CharlieFRuan/harbor
cd harbor
git checkout 112425-terminus2-messages
pip install -e .

Remainings

pip install fastapi uvicorn

We will soon make things uv-syncable.

Data preparation

conda activate otagent
# Download the eval dataset (OTTB-dev)
hf download open-thoughts/OpenThoughts-TB-dev --repo-type=dataset
# Download the train dataset
hf download open-thoughts/OpenThoughts-Agent-v1-RL --repo-type=dataset
# cd into the downloaded folder, say /path/to/.cache/huggingface/hub/datasets--open-thoughts--OpenThoughts-Agent-v1-RL/snapshots/hash_code
cd /path/to/.cache/huggingface/hub/datasets--open-thoughts--OpenThoughts-Agent-v1-RL/snapshots/hash_code
python extract_parquet_tasks.py tasks_new.parquet ./extracted_tasks

Launch

Then configure the paths and API keys at the top of the script, and run:

cd SkyRL/skyrl-train
bash run_otagent.sh

The script is designed to run on 8 GPUs single-node. If that is not your setup, modify these configs correspondingly:

  trainer.placement.policy_num_nodes=1 \
  trainer.placement.ref_num_nodes=1 \
  trainer.placement.policy_num_gpus_per_node=8 \
  trainer.placement.ref_num_gpus_per_node=8 \
  generator.num_inference_engines=8 \
  generator.inference_engine_tensor_parallel_size=1 \

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
.gemini		.gemini
.github/workflows		.github/workflows
docker		docker
skyagent		skyagent
skyrl-gym		skyrl-gym
skyrl-train		skyrl-train
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
format.sh		format.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SkyRL: A Modular Full-stack RL Library for LLMs

Overview of this fork

Environment

Data preparation

Launch

About

Uh oh!

Releases

Packages

Languages

License

mlfoundations/SkyRL

Folders and files

Latest commit

History

Repository files navigation

SkyRL: A Modular Full-stack RL Library for LLMs

Overview of this fork

Environment

Data preparation

Launch

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages