Code accompanying Distribution-Free Statistical Dispersion Control for Societal Applications, presented as a Spotlight paper at Neurips 2023.
Requires python >= 3.10 and the crossing-probability library.
- Data can be downloaded using the Wilds repo: https://github.com/p-lambda/wilds
- Model is sourced from Detoxify: https://github.com/unitaryai/detoxify
- Data can be downloaded using the Wilds repo: https://github.com/p-lambda/wilds
- We trained an ERM model using the code in the above repo
- Data can be downloaded here: https://grouplens.org/datasets/movielens/
- LightFM model sourced from this repo: https://github.com/lyst/lightfm
The commands necessary to reproduce all of our experiments are listed below. Our experiment-ready data can be found under zipped_data/ and includes:
- CivilComments: Logits, labels, group labels
- RxRx1: Logits, labels
- Movielens: User/Item score matrix, group labels
Unzip, create data/ folder and move data to appropriate folders before running each:
data/civil_comments
data/rxrx1
data/ml-1m
cd scripts/
python civil_comments.py --max_per_group=100
python civil_comments.py --max_per_group=200
To include optimized bounds, once you have run the above, run:
notebooks/num_opt_civil_comments-dual_opt-delta-100.ipynb
notebooks/num_opt_civil_comments-dual_opt-delta-200.ipynb
And then once again run:
cd scripts/
python civil_comments.py --max_per_group=100
python civil_comments.py --max_per_group=200
Run:
notebooks/num_opt_civil_comments-prod.ipynb
Run:
cd scripts/
python rxrx1.py
Run:
cd scripts/
python ml-1m.py