Jax official implementation of ICLR2026 paper: Songyuan Zhang, Oswin So, H. M. Sabbir Ahmad, Eric Yang Yu, Matthew Cleaveland, Mitchell Black, and Chuchu Fan: "ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation".
Dependencies • Installation • Quickstart • Environments • Algorithms • Usage • Citation
We recommend to use CONDA to install the requirements:
conda create -n reform python=3.12
conda activate reformThen install the dependencies:
pip install -r requirements.txtInstall ReFORM:
pip install -e .To train a model on the cube-single-noisy-singletask-task1-v0 environment, run:
python scripts/train.py reform --env-name cube-single-noisy-singletask-task1-v0 --steps 3000000 --seed 0To evaluate a model, run:
python scripts/test.py --path ./logs/cube-single-noisy-singletask-task1-v0/reform/seed0_xxxxxxxxxxWe support the OGBench benchmark environments. Since we are not doing goal-conditioned RL, make sure to use the singletask versions of the environments.
We provide the following algorithms:
reform: Our method: Reflected Flows for On-support Offline RL via Noise Manipulation.fql: Flow Q-learning.ifql: Flow version IDQL.dsrl: Diffusion Steering via Reinforcement Learning .
To train the <algo> algorithm on the <env> environment, run:
python scripts/train.py <algo> --env-name <env>The training logs will be saved in logs/<env>/<algo>/seed<seed>_<timestamp>. Use the following command to check the available options:
python scripts/train.py <algo> -hWe provide the complete list of the exact command-line flags used to produce the main results of ReFORM in the paper.
ReFORM does not have environment-specific or dataset-specific hyperparameters, so the same set of hyperparameters is used across all environments except for the number of training steps.
(--q-agg is a minor exception whose effect has not been well studied yet. We use the same option as the FQL implementation).
Click to expand the full list of commands
Change task1 to task2/task3/task4/task5 to run on different tasks.
# ReFORM in antmaze-large environments with clean datasets.
python scripts/train.py reform --env-name antmaze-large-navigate-singletask-task1-v0 --q-agg min --steps 10000000 --seed 0
# ReFORM in antmaze-large environments with noisy datasets.
python scripts/train.py reform --env-name antmaze-large-explore-singletask-task1-v0 --q-agg min --steps 8000000 --seed 0
# ReFORM in cube-single environments with clean datasets.
python scripts/train.py reform --env-name cube-single-play-singletask-task1-v0 --steps 2000000 --seed 0
# ReFORM in cube-single environments with noisy datasets.
python scripts/train.py reform --env-name cube-single-noisy-singletask-task1-v0 --steps 3000000 --seed 0
# ReFORM in cube-double environments with clean datasets.
python scripts/train.py reform --env-name cube-double-play-singletask-task1-v0 --steps 2000000 --seed 0
# ReFORM in cube-double environments with noisy datasets.
python scripts/train.py reform --env-name cube-double-noisy-singletask-task1-v0 --steps 1000000 --save-interval 50000 --seed 0
# ReFORM in scene environments with clean datasets.
python scripts/train.py reform --env-name scene-play-singletask-task1-v0 --steps 2000000 --seed 0
# ReFORM in scene environments with noisy datasets.
python scripts/train.py reform --env-name scene-noisy-singletask-task1-v0 --steps 1000000 --seed 0
# ReFORM in visual-cube-single environments with clean datasets.
python scripts/train.py reform --env-name visual-cube-single-play-singletask-task1-v0 --steps 1000000 --encoder impala_small --p_aug 0.5 --frame_stack 3 --seed 0
# ReFORM in visual-cube-single environments with noisy datasets.
python scripts/train.py reform --env-name visual-cube-single-noisy-singletask-task1-v0 --steps 1000000 --encoder impala_small --p_aug 0.5 --frame_stack 3 --seed 0To test the learned model, use:
python scripts/test.py --path <path-to-log>This should report the mean reward and the safety rate of the learned model. Also, it will generate videos of the learned model in <path-to-log>/videos. Use the following flag to check the available options:
python scripts/test.py -hThis codebase is built upon the FQL implementation.
@inproceedings{zhang2026reform,
title={Re{FORM}: Reflected Flows for On-support Offline {RL} via Noise Manipulation},
author={Zhang, Songyuan and So, Oswin and Ahmad, H M Sabbir and Yu, Eric Yang and Cleaveland, Matthew and Black, Mitchell and Fan, Chuchu},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
}




