RAVEN

This repository contains code for pretraining and evaluation for RAVEN (https://arxiv.org/abs/2308.07922).

Installation

The codebase uses the following dependencies:

python 3 (tested with 3.8)
fairscale (tested with 0.4.6)
transformers (tested with 4.18.0)
numpy (tested with 1.22.4)
faiss (tested with 1.7.2)

Recommend: you can use the Dockerfile to build a docker image that meets all the above requirements

Training RAVEN

Download corpora

python preprocessing/download_corpus.py --corpus enwiki-dec2021 --output_directory ${DATA_DIR}

Download pretrained Atlas checkpoint

python preprocessing/download_model.py --model {model download key} --output_directory ${DATA_DIR}

model download key

models/atlas/xl for Atlas 3B
models/atlas/xxl for Atlas 11B

Train RAVEN

xl (3B)

sbatch example_scripts/train-raven-xl.sh

xxl (11B)

sbatch example_scripts/train-raven-xxl.sh

Note:

For the first run, please set PRETRAINED_MODEL=${DATA_DIR}/models/atlas/${size} to initialize model weights with Atlas checkpoint. For the following runs, set PRETRAINED_MODEL=none so the latest checkpoint can be loaded
If there is no prebuilt index, don't use the load_index_path option, and an index will be built automatically. You can use the save_index_path option to save the index for later usage.
We use a batch size of 64 for both 3B and 11B model training. batch_size = #nodes $\times$ ntasks-per-node $\times$ per_gpu_batch_size

Training time:

Training RAVEN (xl, 5k steps) takes about 1 day on 8 A100 GPUs
Training RAVEN (xxl, 5k steps) takes about 2-3 days on 32 A100 GPUs

Checkpoints

Google Drive

In-Context Learning with RAVEN

NQ

sbatch example_scripts/evaluate-nq.sh

TQA

sbatch example_scripts/evaluate-tqa.sh

MMLU

sbatch example_scripts/evaluate-mmlu.sh

Note:

You may refer to Atlas to download and preprocess the dataset.
If there is no prebuilt index, don't use the load_index_path option, and an index will be built automatically. You can use the save_index_path option to save the index for later usage.
The default batch_size in the script is for xl. Please set a smaller batch_size, e.g., 2, for xxl

Prompting Strategies:

Standard In-Context Learning

set n_shots and fusion=0, e.g., n_shots=0 for 0-shot, n_shots=5 for 5-shot

Fusion-In-Context Learning

set n_shots and fusion, e.g., n_shots=64, fusion=5 for [64-shot, 5-fusion]

Ack and License

This repository is adapted from Meta's Atlas, a retrieval-augmented encoder-decoder language model on which RAVEN is based.

Code License

The majority of the code is licensed under CC-BY-NC, however portions of the project are available under separate license terms: huggingface transformers is licensed under the Apache 2.0 license, which covers src/modeling_bert.py and src/modeling_t5.py.

Data License

The wikipedia-derived data used in the repository, such as the corpora and indices available from download_corpus.py and download_index.py are licensed according to CC-BY-SA.

Name	Name	Last commit message	Last commit date
Latest commit jeffhj Update README.md Apr 11, 2025 0aeac05 · Apr 11, 2025 History 3 Commits
example_scripts	example_scripts	init	Sep 5, 2023
preprocessing	preprocessing	init	Sep 5, 2023
src	src	init	Sep 5, 2023
Dockerfile	Dockerfile	init	Sep 5, 2023
LICENSE	LICENSE	init	Sep 5, 2023
README.md	README.md	Update README.md	Apr 11, 2025
evaluate.py	evaluate.py	init	Sep 5, 2023
requirements.txt	requirements.txt	init	Sep 5, 2023
train.py	train.py	init	Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAVEN

Installation

Training RAVEN

Checkpoints

In-Context Learning with RAVEN

Ack and License

About

Releases

Packages

Languages

License

jeffhj/RAVEN

Folders and files

Latest commit

History

Repository files navigation

RAVEN

Installation

Training RAVEN

Checkpoints

In-Context Learning with RAVEN

Ack and License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages