Docs: https://cardiffnlp.github.io/dialz/
Steering vectors allow users to modify activations at inference time to amplify or weaken a 'concept', e.g. honesty or positivity.
Dialz supports a diverse set of tasks, including creating contrastive pair datasets, computing and applying steering vectors, and visualizations.
A basic tutorial can be found here.
pip install dialz
Check out the full documentation for usage information.
Any contributions to improve this project are welcome! Please open an issue or pull request in this repo with any changes you have.
This code is released under a MIT license.