Curiosity based RL training for LLM agents

Setup

Create your conda environment

conda create -n viper python==3.9.0
conda activate viper

Install Alfworld

cd alfworld/TextWorld
```
conda install cython
conda install numpy
pip install --no-build-isolation -e .[full]
```

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118

cd ..

pip install -e .[full]

Install lamorel

cd lamorel/lamorel
pip install -e .
pip install wandb gym peft bitsandbytes pyvirtualdisplay

Download Alfworld episodes data and generate game files

export ALFWORLD_DATA=<storage_path>
alfworld-download
alfworld-generate
change data path in alfworld configs to your custom path

Find and run the scripts in scripts folder.

Models used:

The current code is based on Llama - 1B model.

Curiosity rewards:

Two types of novelty are rewarded:

Action novelty (horizontal):
To reduce action repetition from the LLM, actions that occur less frequently in a trajectory are rewarded more.
Action patterns novelty (vertical):
A novel sequence of actions is rewarded based on an auxiliary model loss (the temporal predictor model).

Temporal Predictor

A T5 model is used as the temporal predictor, trained on the PPO buffer with the task of predicting the next action given the previous sequence of actions in the current trajectory.
Its loss on new trajectories is used as the action pattern novelty reward.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Assets		Assets
alfworld		alfworld
experiments		experiments
lamorel		lamorel
scripts		scripts
temporal_prediction		temporal_prediction
utils		utils
.gitignore		.gitignore
.run.sh.swp		.run.sh.swp
README.md		README.md
Results_analysis_.ipynb		Results_analysis_.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curiosity based RL training for LLM agents

Setup

Create your conda environment

Install Alfworld

Install lamorel

Download Alfworld episodes data and generate game files

Models used:

Curiosity rewards:

Temporal Predictor

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Curiosity based RL training for LLM agents

Setup

Create your conda environment

Install Alfworld

Install lamorel

Download Alfworld episodes data and generate game files

Models used:

Curiosity rewards:

Temporal Predictor

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages