Download the following dataset files into the data/
directory. Then, generate the K-shot samples for the training and validation sets using the script provided by the previous work.
python generate_k_shot.py --k 8 --dataset tacrev --data_file train.txt
python generate_k_shot.py --k 8 --dataset tacrev --data_file val.txt
python generate_k_shot.py --k 16 --dataset tacrev --data_file train.txt
python generate_k_shot.py --k 16 --dataset tacrev --data_file val.txt
conda create -n CLMA python=3.7.16
conda activate CLMA
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 -c pytorch
pip install transformers datasets tenacity openai wandb scikit-learn pandas
pip install accelerate -U
1, Obtaining rules from training set
Change the EXP_METHOD
to run_induction
in run.sh
and set your openai API Key, then execute the script.
DATASET_NAME="tacrev"
EXP_METHOD="run_induction"
LRE_SETTING="k-shot"
K_SHOT_LRE="8"
K_SHOT_ID="1"
python main.py ... --api_key Your_API_Key
bash run.sh
The induced rules will be saved at outputs/${DATASET_NAME}/${K_SHOT_LRE}/k-shot/${K_SHOT_ID}/pr/
directory.
The rules are saved in {Your experiment name}.json
file. Assign the name of generated rule file to settings in src/configs/rule_trian_config.json
, e.g.
{
"tacrev": {
"k-shot": {
"8-1": "{Your experiment name}.json"
}
}
}
2, Obtaining rules from validation set
Set the split
hyperparameter to val
,
python main.py ... --split val
Then, run the script.
bash run.sh
Similarly, assign the name of generated rule file to settings in src/configs/rule_val_config.json
.
Then, run the script for fine-tuning SLM.
cd slm/
bash run.sh
The checkpoint of SLM is saved in src/slm/ckpt/${DATASET_NAME}/${K_SHOT_LRE}/k-shot/${K_SHOT_ID}/
.
Set the checkpoint of SLM to settings in src/configs/slm_ckpt_config.json
.
Change the EXP_METHOD
to run_induction
, then run the run.sh
EXP_METHOD="collaborative_da"
bash run.sh
EXP_METHOD="llm_inference"
bash run.sh