This repository accompanies the manuscript on EVFI/DeepEVFI for directed evolution campaigns. It bundles:
- public and newly released multi-round NGS datasets,
- our deepfitness implementation (EVFI and DeepEVFI),
- third-party baselines (ACIDES, Enrich2),
- benchmarking scripts, notebooks, and result CSVs used in the paper.
- Clone this repository.
- Create environments for each method (recommended via
conda+mamba).deepfitness/env.ymlACIDES/env.ymlEnrich2/env.yml
- Install deepfitness for local development:
conda env create -f deepfitness/env.yml conda activate deepfitness pip install hackerargs pip install -e deepfitness
- Run EVFI on a sample subset:
python -m deepfitness.scripts.train_simplefitness \ --csv deepfitness/example/TEAD_subset500.csv \ --genotype_col HELMnolinker \ --round_cols [0,1,2,3,4,5,6] \ --output_folder deepfitness/example/output_evfi - Run DeepEVFI using provided configs (requires GPU-ready env):
python -m deepfitness.scripts.train_deep_latent \ --config run-benchmarks/gt/filtzero_without_lastround/config_files/final_deep_latent_tead_1fc_p2tl_filtzero.yaml \ --project_output_folder /path/to/output
deepfitness/– EVFI & DeepEVFI library, CLI scripts, example data, and env spec.ACIDES/,Enrich2/– vendor code and wrappers to reproduce baselines.datasets/– filtered count tables and preprocessing notebooks.data-exp/– SPR KD measurement data, used in figures in the paper.results-data/– CSV exports of some of the manuscript figures/metrics.run-benchmarks/,run-alltime/– command templates for running benchmarks and fitness inference as used in the paper.notebooks/– figure notebooks and rendered outputs.utils/– helper functions shared across scripts.
datasets/__raw/contains pre-filtered count tables; seedatasets/README.mdfor filtering steps.datasets/filter.ipynbdocuments the additional filtering for running benchmarking, generatingdatasets/filtzero_without_lastroundanddatasets/filtzero_without_2ndtolastround.data-exp/holds SPR measurements:tead3_spr_v3.csv,exp_merged_efh.csv.
Ensure you respect any data usage agreements before redistribution.
- Prepare environments for each method as above.
- Generate commands:
run-benchmarks/gt/run.shandrun-alltime/gt/run.shlist example invocations.- Update absolute paths (
/evfi-manuscript-public/...) to match your workspace.
- Run methods:
- DeepEVFI/EVFI:
python -m deepfitness.scripts.train_deep_latentortrain_simplefitness. - ACIDES baseline:
run_acides.pyafter installingACIDES/env.yml. - Enrich2 baseline:
run_enrich2.pywithinEnrich2env.
- DeepEVFI/EVFI:
- Collect outputs into
results-data/layout to compare against provided CSVs.
The notebooks/ folder reads from these output directories to regenerate manuscript plots.
Core entry points are under deepfitness/deepfitness/scripts/:
- Preprocessing:
filter_count_table.py,check_count_table_sanity.py,check_genotype_schema.py. - Inference:
train_simplefitness.py,train_simple_latent.py,train_deepfitness.py,train_deep_latent.py. - Post-processing:
predict_deepfitness.py,compute_evidence_scores.py,compute_uncertainty_profile_likelihood.py,merge_*_fitness_csvs.py.
Configuration can be supplied via CLI flags or YAML files (see deepfitness/deepfitness/options/).
ACIDES/packages the ACIDES codebase with our runner script. FollowACIDES/README.mdfor setup.Enrich2/includes the Enrich2 release plus a driver script for batch experiments.
results-data/benchmark-filtzero-without-lastround/contains published CSV metrics.notebooks/*.ipynbregenerate the SPR comparison figures; PDFs/PNGs are exported alongside.
If you use this code or datasets, please cite our manuscript.