Deep Evolutionary Fitness Inference for Variant Nomination from Directed Evolution

This repository accompanies the manuscript on EVFI/DeepEVFI for directed evolution campaigns. It bundles:

public and newly released multi-round NGS datasets,
our deepfitness implementation (EVFI and DeepEVFI),
third-party baselines (ACIDES, Enrich2),
benchmarking scripts, notebooks, and result CSVs used in the paper.

Quick Start

Clone this repository.
Create environments for each method (recommended via conda + mamba).
- deepfitness/env.yml
- ACIDES/env.yml
- Enrich2/env.yml

Install deepfitness for local development:

conda env create -f deepfitness/env.yml
conda activate deepfitness
pip install hackerargs
pip install -e deepfitness

Run EVFI on a sample subset:

python -m deepfitness.scripts.train_simplefitness \
    --csv deepfitness/example/TEAD_subset500.csv \
    --genotype_col HELMnolinker \
    --round_cols [0,1,2,3,4,5,6] \
    --output_folder deepfitness/example/output_evfi

Run DeepEVFI using provided configs (requires GPU-ready env):

python -m deepfitness.scripts.train_deep_latent \
    --config run-benchmarks/gt/filtzero_without_lastround/config_files/final_deep_latent_tead_1fc_p2tl_filtzero.yaml \
    --project_output_folder /path/to/output

Repository Structure

deepfitness/ – EVFI & DeepEVFI library, CLI scripts, example data, and env spec.
ACIDES/, Enrich2/ – vendor code and wrappers to reproduce baselines.
datasets/ – filtered count tables and preprocessing notebooks.
data-exp/ – SPR KD measurement data, used in figures in the paper.
results-data/ – CSV exports of some of the manuscript figures/metrics.
run-benchmarks/, run-alltime/ – command templates for running benchmarks and fitness inference as used in the paper.
notebooks/ – figure notebooks and rendered outputs.
utils/ – helper functions shared across scripts.

Data Sources

datasets/__raw/ contains pre-filtered count tables; see datasets/README.md for filtering steps.
datasets/filter.ipynb documents the additional filtering for running benchmarking, generating datasets/filtzero_without_lastround and datasets/filtzero_without_2ndtolastround.
data-exp/ holds SPR measurements: tead3_spr_v3.csv, exp_merged_efh.csv.

Ensure you respect any data usage agreements before redistribution.

Reproducing Manuscript Results

Prepare environments for each method as above.
Generate commands:
- run-benchmarks/gt/run.sh and run-alltime/gt/run.sh list example invocations.
- Update absolute paths (/evfi-manuscript-public/...) to match your workspace.
Run methods:
- DeepEVFI/EVFI: python -m deepfitness.scripts.train_deep_latent or train_simplefitness.
- ACIDES baseline: run_acides.py after installing ACIDES/env.yml.
- Enrich2 baseline: run_enrich2.py within Enrich2 env.
Collect outputs into results-data/ layout to compare against provided CSVs.

The notebooks/ folder reads from these output directories to regenerate manuscript plots.

deepfitness Overview

Core entry points are under deepfitness/deepfitness/scripts/:

Preprocessing: filter_count_table.py, check_count_table_sanity.py, check_genotype_schema.py.
Inference: train_simplefitness.py, train_simple_latent.py, train_deepfitness.py, train_deep_latent.py.
Post-processing: predict_deepfitness.py, compute_evidence_scores.py, compute_uncertainty_profile_likelihood.py, merge_*_fitness_csvs.py.

Configuration can be supplied via CLI flags or YAML files (see deepfitness/deepfitness/options/).

Baseline Methods

ACIDES/ packages the ACIDES codebase with our runner script. Follow ACIDES/README.md for setup.
Enrich2/ includes the Enrich2 release plus a driver script for batch experiments.

Results and Figures

results-data/benchmark-filtzero-without-lastround/ contains published CSV metrics.
notebooks/*.ipynb regenerate the SPR comparison figures; PDFs/PNGs are exported alongside.

Support

If you use this code or datasets, please cite our manuscript.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Evolutionary Fitness Inference for Variant Nomination from Directed Evolution

Quick Start

Repository Structure

Data Sources

Reproducing Manuscript Results

deepfitness Overview

Baseline Methods

Results and Figures

Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ACIDES		ACIDES
Enrich2		Enrich2
data-exp		data-exp
datasets		datasets
deepfitness		deepfitness
notebooks		notebooks
results-data		results-data
run-alltime		run-alltime
run-benchmarks		run-benchmarks
utils		utils
LICENSE		LICENSE
README.md		README.md

License

Genentech/deepevfi

Folders and files

Latest commit

History

Repository files navigation

Deep Evolutionary Fitness Inference for Variant Nomination from Directed Evolution

Quick Start

Repository Structure

Data Sources

Reproducing Manuscript Results

deepfitness Overview

Baseline Methods

Results and Figures

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages