A companion repository for Gradient Routing: Masking Gradients to Localize Computation in Neural Networks.
factored_representationsis for shared functionality, although in practice, code for different subprojects is mostly siloedmasklib.pyandmodel_expansion.pyimplement Expand, Route, Ablate for any TransformerLens model.- Has some tests
projectscontains the code to reproduce the results in the paperminigrid- localizing behavioral tendencies in a gridworld reinforcement learning agentmnist- splitting representations of an MNIST autoencodernanoGPT-factrep- training a model with a steering scalar, and unlearning virologytinystories- unlearning a subset of TinyStories
shared_configsis for commonly-used configurations, e.g. model definitions, standard training config options
- Install PDM
- Install the PDM project (ie. install the dependencies)
pdm install
- Install the recommended VSCode extensions
- Install the pre-commit git hooks
pdm run pre-commit install
You can then run Python scripts with pdm run python <script.py> or by activating the
virtual environment specified by pdm info. Eg:
source /pdm-venvs/factored-representations-Dp430888-3.12/bin/activate.vscode/settings.json is configured to automatically format and lint the code with
Ruff (using the extension) on save.
Run the tests with:
pdm run pytest@article{cloud2024gradient,
title={Gradient Routing: Masking Gradients to Localize Computation in Neural Networks},
url={https://arxiv.org/abs/2410.04332v1},
journal={arXiv.org},
author={Cloud, Alex and Goldman-Wetzler, Jacob and Wybitul, Evžen and Miller, Joseph and Turner, Alexander Matt},
year={2024},
}