Skip to content

jianghoucheng/ConG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

1. Installation

pip install -e ./transformers-main
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

2. Quick Start

Step 1: Preference Data Construction

Align preference data by running the pipeline:

cd on_policy_data_gen
sh run_pipline.sh   # You can adjust model, path, sampling parameters, etc.

python convert_data_to_dpo.py \
  --input_path datasets/Llama3.2-3B-Instruct/all_outputs_bin.json \
  --output_path ../data/ultrafeedback_Llama3.2_3B.json

Note: Both the model and output path can be modified as needed.

Step 2: DPO Alignment

Run DPO alignment with the processed dataset. Make sure to configure the model, dataset path, and hyperparameters according to your setup.

llamafactory-cli train examples/train_full/llama3.2_3B_full_dpo_ds3.yaml

Step 3: Contrastive Decoding

cd experiments
sh launch_parallel_cd.sh
sh merge_parallel_cd.sh

Note: Contrastive decoding is not compatible with vLLM acceleration. On large datasets, the process can be very slow. To address this, parallel execution is used.

Step 4: ConG

cd ../
cd on_policy_data_gen
sh run_llama3_8B_w2s_3B_8B.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors