ReFORM

Jax official implementation of ICLR2026 paper: Songyuan Zhang, Oswin So, H. M. Sabbir Ahmad, Eric Yang Yu, Matthew Cleaveland, Mitchell Black, and Chuchu Fan: "ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation".

Dependencies • Installation • Quickstart • Environments • Algorithms • Usage • Citation

Dependencies

We recommend to use CONDA to install the requirements:

conda create -n reform python=3.12
conda activate reform

Then install the dependencies:

pip install -r requirements.txt

Installation

Install ReFORM:

pip install -e .

Quickstart

To train a model on the cube-single-noisy-singletask-task1-v0 environment, run:

python scripts/train.py reform --env-name cube-single-noisy-singletask-task1-v0 --steps 3000000 --seed 0

To evaluate a model, run:

python scripts/test.py --path ./logs/cube-single-noisy-singletask-task1-v0/reform/seed0_xxxxxxxxxx

Environments

We support the OGBench benchmark environments. Since we are not doing goal-conditioned RL, make sure to use the singletask versions of the environments.

Algorithms

We provide the following algorithms:

reform: Our method: Reflected Flows for On-support Offline RL via Noise Manipulation.
fql: Flow Q-learning.
ifql: Flow version IDQL.
dsrl: Diffusion Steering via Reinforcement Learning .

Usage

Train

To train the <algo> algorithm on the <env> environment, run:

python scripts/train.py <algo> --env-name <env>

The training logs will be saved in logs/<env>/<algo>/seed<seed>_<timestamp>. Use the following command to check the available options:

python scripts/train.py <algo> -h

We provide the complete list of the exact command-line flags used to produce the main results of ReFORM in the paper. ReFORM does not have environment-specific or dataset-specific hyperparameters, so the same set of hyperparameters is used across all environments except for the number of training steps. (--q-agg is a minor exception whose effect has not been well studied yet. We use the same option as the FQL implementation).

Click to expand the full list of commands

Change task1 to task2/task3/task4/task5 to run on different tasks.

# ReFORM in antmaze-large environments with clean datasets.
python scripts/train.py reform --env-name antmaze-large-navigate-singletask-task1-v0 --q-agg min --steps 10000000 --seed 0
# ReFORM in antmaze-large environments with noisy datasets.
python scripts/train.py reform --env-name antmaze-large-explore-singletask-task1-v0 --q-agg min --steps 8000000 --seed 0
# ReFORM in cube-single environments with clean datasets.
python scripts/train.py reform --env-name cube-single-play-singletask-task1-v0 --steps 2000000 --seed 0
# ReFORM in cube-single environments with noisy datasets.
python scripts/train.py reform --env-name cube-single-noisy-singletask-task1-v0 --steps 3000000 --seed 0
# ReFORM in cube-double environments with clean datasets.
python scripts/train.py reform --env-name cube-double-play-singletask-task1-v0 --steps 2000000 --seed 0 
# ReFORM in cube-double environments with noisy datasets.
python scripts/train.py reform --env-name cube-double-noisy-singletask-task1-v0 --steps 1000000 --save-interval 50000 --seed 0 
# ReFORM in scene environments with clean datasets.
python scripts/train.py reform --env-name scene-play-singletask-task1-v0 --steps 2000000 --seed 0
# ReFORM in scene environments with noisy datasets.
python scripts/train.py reform --env-name scene-noisy-singletask-task1-v0 --steps 1000000 --seed 0
# ReFORM in visual-cube-single environments with clean datasets.
python scripts/train.py reform --env-name visual-cube-single-play-singletask-task1-v0 --steps 1000000 --encoder impala_small --p_aug 0.5 --frame_stack 3 --seed 0
# ReFORM in visual-cube-single environments with noisy datasets.
python scripts/train.py reform --env-name visual-cube-single-noisy-singletask-task1-v0 --steps 1000000 --encoder impala_small --p_aug 0.5 --frame_stack 3 --seed 0

Test

To test the learned model, use:

python scripts/test.py --path <path-to-log>

This should report the mean reward and the safety rate of the learned model. Also, it will generate videos of the learned model in <path-to-log>/videos. Use the following flag to check the available options:

python scripts/test.py -h

Acknowledgements

This codebase is built upon the FQL implementation.

Citation

@inproceedings{zhang2026reform,
      title={Re{FORM}: Reflected Flows for On-support Offline {RL} via Noise Manipulation},
      author={Zhang, Songyuan and So, Oswin and Ahmad, H M Sabbir and Yu, Eric Yang and Cleaveland, Matthew and Black, Mitchell and Fan, Chuchu},
      booktitle={The Fourteenth International Conference on Learning Representations},
      year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
media		media
reform		reform
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReFORM

Dependencies

Installation

Quickstart

Environments

Algorithms

Usage

Train

Test

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MIT-REALM/reform

Folders and files

Latest commit

History

Repository files navigation

ReFORM

Dependencies

Installation

Quickstart

Environments

Algorithms

Usage

Train

Test

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages