Skip to content

Official Implementation of ICLR2026 Paper: Songyuan Zhang, Oswin So, H. M. Sabbir Ahmad, Eric Yang Yu, Matthew Cleaveland, Mitchell Black, Chuchu Fan: "ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation".

License

Notifications You must be signed in to change notification settings

MIT-REALM/reform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

antmaze-large cube-single cube-double scene
ReFORM Framework

Dependencies

We recommend to use CONDA to install the requirements:

conda create -n reform python=3.12
conda activate reform

Then install the dependencies:

pip install -r requirements.txt

Installation

Install ReFORM:

pip install -e .

Quickstart

To train a model on the cube-single-noisy-singletask-task1-v0 environment, run:

python scripts/train.py reform --env-name cube-single-noisy-singletask-task1-v0 --steps 3000000 --seed 0

To evaluate a model, run:

python scripts/test.py --path ./logs/cube-single-noisy-singletask-task1-v0/reform/seed0_xxxxxxxxxx

Environments

We support the OGBench benchmark environments. Since we are not doing goal-conditioned RL, make sure to use the singletask versions of the environments.

Algorithms

We provide the following algorithms:

Usage

Train

To train the <algo> algorithm on the <env> environment, run:

python scripts/train.py <algo> --env-name <env>

The training logs will be saved in logs/<env>/<algo>/seed<seed>_<timestamp>. Use the following command to check the available options:

python scripts/train.py <algo> -h

We provide the complete list of the exact command-line flags used to produce the main results of ReFORM in the paper. ReFORM does not have environment-specific or dataset-specific hyperparameters, so the same set of hyperparameters is used across all environments except for the number of training steps. (--q-agg is a minor exception whose effect has not been well studied yet. We use the same option as the FQL implementation).

Click to expand the full list of commands

Change task1 to task2/task3/task4/task5 to run on different tasks.

# ReFORM in antmaze-large environments with clean datasets.
python scripts/train.py reform --env-name antmaze-large-navigate-singletask-task1-v0 --q-agg min --steps 10000000 --seed 0
# ReFORM in antmaze-large environments with noisy datasets.
python scripts/train.py reform --env-name antmaze-large-explore-singletask-task1-v0 --q-agg min --steps 8000000 --seed 0
# ReFORM in cube-single environments with clean datasets.
python scripts/train.py reform --env-name cube-single-play-singletask-task1-v0 --steps 2000000 --seed 0
# ReFORM in cube-single environments with noisy datasets.
python scripts/train.py reform --env-name cube-single-noisy-singletask-task1-v0 --steps 3000000 --seed 0
# ReFORM in cube-double environments with clean datasets.
python scripts/train.py reform --env-name cube-double-play-singletask-task1-v0 --steps 2000000 --seed 0 
# ReFORM in cube-double environments with noisy datasets.
python scripts/train.py reform --env-name cube-double-noisy-singletask-task1-v0 --steps 1000000 --save-interval 50000 --seed 0 
# ReFORM in scene environments with clean datasets.
python scripts/train.py reform --env-name scene-play-singletask-task1-v0 --steps 2000000 --seed 0
# ReFORM in scene environments with noisy datasets.
python scripts/train.py reform --env-name scene-noisy-singletask-task1-v0 --steps 1000000 --seed 0
# ReFORM in visual-cube-single environments with clean datasets.
python scripts/train.py reform --env-name visual-cube-single-play-singletask-task1-v0 --steps 1000000 --encoder impala_small --p_aug 0.5 --frame_stack 3 --seed 0
# ReFORM in visual-cube-single environments with noisy datasets.
python scripts/train.py reform --env-name visual-cube-single-noisy-singletask-task1-v0 --steps 1000000 --encoder impala_small --p_aug 0.5 --frame_stack 3 --seed 0

Test

To test the learned model, use:

python scripts/test.py --path <path-to-log>

This should report the mean reward and the safety rate of the learned model. Also, it will generate videos of the learned model in <path-to-log>/videos. Use the following flag to check the available options:

python scripts/test.py -h

Acknowledgements

This codebase is built upon the FQL implementation.

Citation

@inproceedings{zhang2026reform,
      title={Re{FORM}: Reflected Flows for On-support Offline {RL} via Noise Manipulation},
      author={Zhang, Songyuan and So, Oswin and Ahmad, H M Sabbir and Yu, Eric Yang and Cleaveland, Matthew and Black, Mitchell and Fan, Chuchu},
      booktitle={The Fourteenth International Conference on Learning Representations},
      year={2026},
}

About

Official Implementation of ICLR2026 Paper: Songyuan Zhang, Oswin So, H. M. Sabbir Ahmad, Eric Yang Yu, Matthew Cleaveland, Mitchell Black, Chuchu Fan: "ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages