Skip to content

Phyllochron is an ILP method that solves the Maximum Likelihood Longitudinal Assignment Problem under the Longitudinally Observed Perfect Phylogeny Model.

License

Notifications You must be signed in to change notification settings

raphael-group/Phyllochron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phyllochron (Maximum Likelihood Assignment for Longitudinal Reconstruction)

screenshot Phyllochron employs an ILP to solve the Maximum Likelihood Longitudinal Assignment Problem to infer a Longitudinally Observed Perfect Phylogeny Time-Labelled Matrix.

Phyllchron takes as input:

  • Variant and total read counts associated with each cell
  • Discrete timepoints associated with each cell
  • A perfect phylogeny clone tree
  • A fractional threshold representing the minimum proportion of cells at a sample assigned to a clone for that clone to be present in that sample

Contents

  1. Pre-requisites
  2. Usage instcructions

Pre-requisites (see .yaml file for versions)

Usage instructions

I/O formats

The input for Phyllochron is

  • A comma-delimited file, which has on each line the variant read counts associated with a single cell for each mutation locus.

    • Example: data/AML/input_data/AML-63_variant_readcounts.csv
  • A comma-delimited file, which has on each line the total read counts associated with a single cell for each mutation locus.

    • Example: data/AML/input_data/AML-63_total_readcounts.csv Alternatively, instead of readcount information, Phyllochron can also take in a character matrix as input.
  • A comma-delimited file, which has on each line the character state associated with a single cell for each mutation locus.

    • Example: data/AML/input_data/AML-63_character_matrix.csv
  • A comma-delimited file, which has on each line the timepoint associated with each cell.

    • Example: data/AML/input_data/AML-63_timepoints.csv
  • A comma-delimited file, which has on each line has a binary clone profile corresponding to the mutation profile assigned to all present clones.

    • Example: data/AML/input_data/AML-63_mutation_tree.csv
  • A fractional threshold z representing the minimum proportion of cells at a sample assigned to a clone for that clone to be present in that sample. For example, z = 0.10 means that at least 10% of cells in a sample must be assigned to a clone for it to be present in that sample.

Phyllochron

usage: phyllochron.py [-i CHARACTER_MATRIX] [-t TIMEPOINTS] [--mutation-tree MUTATION_TREE][-o OUTPUT_PREFIX] [-z Z] [-a FP] [-b FN] [--ado ADO] [--time-limit TIME_LIMIT]

       phyllochron.py [-r TOTAL_READS] [-v VARIANT_READS] [-t TIMEPOINTS]  [--mutation-tree MUTATION_TREE] [-o OUTPUT_PREFIX] [-z Z] [-a FP] [-b FN] [--ado ADO] [--time-limit TIME_LIMIT]

required arguments:
  -i CHARACTER_MATRIX   filepath for the character matrix csv file     
  or
  -r TOTAL_READS   filepath for the total read counts csv file     
  -v VARIANT_READS   filepath for the variant read counts csv file     
  and
  --mutation-tree MUTATION_TREE filepath for the mutation tree csv file
  -t TIMEPOINTS   filepath for the timepoints file  
  -o OUTPUT_PREFIX filepath indicating the output prefix for all output files
optional arguments:
  -z Z  fractional clonal presence threshold. Default is z = 0.05 
  -a FP false positive error rate. Default is a = 0.001
  -b FN false negative error rate. Default is b = 0.001
  --ado ADO precision parameter for ADO. Default is ado = 15
  --time-limit TIME_LIMIT time limit for solver in seconds. Default is 1800 seconds

An example of usage is as follows. This command can be run from the directory that contains this README file.

python src/phyllochron.py -r data/AML/input_data/AML-63_total_readcounts.csv -v data/AML/input_data/AML-63_variant_readcounts.csv -t data/AML/input_data/AML-63_timepoints.csv --mutation-tree data/AML/input_data/AML-63_mutation_tree.csv -o data/AML/output_data/AML-63 -z 0.05 -a 0.01 -b 0.038 --ado 15 --time-limit 1000

Data

Currently, the csv files encoding the AML-63 & AML-97 readcount data are stored in data/AML/input_data and the Phyllochron inferred cell assignments are stored in data/AML/output_data.

About

Phyllochron is an ILP method that solves the Maximum Likelihood Longitudinal Assignment Problem under the Longitudinally Observed Perfect Phylogeny Model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages