Skip to content

THUQiXuan/Ours-Minimal-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Environment Setup

  1. Create a new environment.
    conda create -n raftpp python==3.10
    conda activate raftpp
  2. Install dependencies
    pip install pip --upgrade
    pip install uv
    python -m uv pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
    python -m uv pip install flash-attn --no-build-isolation
    git clone https://github.com/RLHFlow/Minimal-RL.git
    cd Minimal-RL/
    python -m uv pip install -e .
    python -m uv pip install vllm==0.6.3

Experiments Running

  1. Prepare the training and test datasets. (Already down)

    python scripts/data_preprocess/math_dataset.py
    python scripts/data_preprocess/numina_math.py
  2. Change the model_name_or_path to path of model in scripts/run_grpo8.sh

  3. Start the training loop.

    bash scripts/run_grpo8.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors