Skip to content

CahanLab/compforce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

compforce

Computational tools for cell fate engineering

Introduction

CompForce ("COMPutational tools FOR Cell fate Engineering") is a platform for comparing results of computational fate determinant prediction tools. CompForce adds flexibility to existing tools, allowing users to provide their own transcriptomic data and network structures, and optionally aggregates results.

Additionally, CompForce includes various functions to aid in benchmarking results. In the latter part of this tutorial, we provide various datasets (real and synthetic) and give an example of how to take advantage of the Dyngen platform (see dyngen.dynverse.org) to simulate synthetic predictions.

  1. CcompForce
  2. Adding in new methods
  3. Synthetic Simulations

CompForce

At the core of CompForce is the "CompForce" class. This object stores raw (counts) expression data, corresponding metadata, optional user provided network, and the raw and pruned results. Other information optionally stored includes: species, TFs, methods run, parameters for specific methods, transitions analyzed.

Structure

  • cf_initialize : Initializes the CompForce object

  • add_transition : Specifies the transition of interest

  • pm_diffexp : Runs differential expression. Parameter 'method' can be specified as either 'ttest' or 'deseq'

  • pm_cellnet : Runs CellNet. If parameter use_cfobj_network=FALSE, will reconstruct the network based on the stored expression data

  • pm_mogrify : Runs a flexible version of Mogrify. If parameter use_cfobj_network=FALSE, will use the network from STRINGdb, based on species.

  • pm_dalessio : Runs JSD portion of the method by d'Alessio et al. 2015. If parameter all_background=FALSE, will use source samples/cells as background. Otherwise, will default to using all non-target data as background.

  • cf_RRA : Aggregates the results stored in cfobj@results. Uses Robust Rank Aggregation.

Data

To initialize a CompForce object, provide expression data, metadata, and TFs:

  • expCounts : raw counts expression matrix, genes-by-samples/cells
  • sampTab : sample table (metadata), with row names matching the column names of expCounts
  • TFs : TFs -- see /data for mouse and human TFs

Also provide (optional):

  • network : GRN dataframe. Columns should be ordered "TF","TG","weight","type". 'type' is either 1 or -1 corresponding to activation and repression respectively

Example 0: B Cell to Macrophage

Load Data

# Expression data and corresponding metadata
expDat<-utils_loadObject("data/dataset_01_expX_counts.rda")
sampTab<-utils_loadObject("data/dataset_01_sampTab.rda")

# List of mouse TFs
mmTFs<-utils_loadObject("data/mmTFs.rda")
mmTFs<-intersect(rownames(expDat),mmTFs)

# A Network
grn<-utils_loadObject("data/mouse_chipX.rda")

Initialize the CompForce Object

# Initialize
cforce<-cf_initialize(expDat,sampTab,TFs=mmTFs,description_column="celltype",network=grn)

# Add transition of interest
cforce<-add_transition(cforce,"bcell","macrophage")

Run methods of interest

# You can run any combination of the following:
cforce<-pm_diffexp(cforce,method="ttest")
cforce<-pm_diffexp(cforce,method="deseq")
cforce<-pm_cellnet(cforce)
cforce<-pm_mogrify(cforce)
cforce<-pm_dalessio(cforce)


# Alternatively, you can do this in one step like so:
# cforce<-cf_onerun(cforce,methods="all")

# To run a subset of methods: methods is any combination of c("diffexp_ttest","diffexp_deseq","cellnet","mogrify","dalessio")

Aggregate results

cforce<-cf_RRA(cforce)

Looking at the results

cforce@results_raw stores the raw results from running each method. cforce@results stores the cleaned results, which is presented as individual dataframes specifying 'TF' and 'rank'.

For example:

cforce@results$bcell..macrophage$CellNet

#          TF rank
# Fli1   Fli1    1
# Spi1   Spi1    2
# Irf8   Irf8    3
# Nfe2   Nfe2    4
# Gata6 Gata6    5
# Rarg   Rarg    6
# Rarb   Rarb    7
# Lyl1   Lyl1    8
# Mef2a Mef2a    9

cforce@results$bcell..macrophage$Mogrify[1:10,]

#              TF rank
# Nr0b1     Nr0b1    1
# Sox6       Sox6    2
# Gata4     Gata4    3
# Sox18     Sox18    4
# Zfp365   Zfp365    5
# Lbx1       Lbx1    6
# Neurod2 Neurod2    7
# Foxa3     Foxa3    8
# Esrrb     Esrrb    9
# Hoxa11   Hoxa11   10

cforce@results$bcell..macrophage$aggregate[1:10,]

#            TF    RRA_Score rank
# Nfe2     Nfe2 1.074761e-06    1
# Gata6   Gata6 2.097536e-06    2
# Rarb     Rarb 2.223193e-05    3
# Bmp2     Bmp2 3.952438e-04    4
# Epas1   Epas1 8.868822e-04    5
# Zbtb16 Zbtb16 1.205503e-03    6
# Nfia     Nfia 2.450168e-03    7
# Nlrp3   Nlrp3 3.471862e-03    8
# Tcf7l2 Tcf7l2 3.518621e-03    9
# Cebpa   Cebpa 5.702262e-03   10

Adding in a new method

The following describes steps to add in a new method to the CompForce platform locally.

  1. Create a new method file: Akin to "pm_cellnet.R","pm_delassio.R", and others-- this is a file that contains functions required to run the method. At the center of this file should be a main wrapper function that takes in a compforce object and a parameter specifying the transition row of interest (as defined in the compforce object). This main function should return an updated compforce object. See "pm_cellnet", "pm_dalessio", etc. for details.
  2. Update cf_predict.R: Update the function cf_onerun in cf_predict.R such that it calls the new method if methods=="all", or if the new method is specified.

Synthetic Simulations

We include in this package some tools to aid in benchmarking.

Datasets

As part of the CompForce project, we benchmarked existing prediction methods on real and synthetic datasets. A description of these datasets and where to access them are as follows:

Synthetic Data

These are 15 synthetic datasets generated using the Dyngen platform. For each dataset, the following files exist:

  1. "model_common.." : The common Dyngen model (no simulations run yet)
  2. "model_wt..": The WT Dyngen model with simulations corresponding to the WT populations
  3. "sc_expX..": The expression data
  4. "sc_sampTab..": The corresponding metadata
  5. "grn..": The GRN used to simulate the system
  6. "TFs..": A list of TFs in the dataset

For nets01-05 and 11-15, a dynamic version of the dataset also exists for use in predicting and simulating dynamic, step-wise differentiation trajectories:

  1. "dynamic_sc_expX..": The expression data for step-wise predictions
  2. "dynamic_sc_sampTab..": The corresponding metadata

Finally, should users be interested in applying SingleCellNet to assess simulations, a pre-trained classifier for each model is provided in "classifiers_scexp..".

Real Bulk Data

These are 3 real curated bulk datasets:

  1. dataset_01 : a broad dataset spanning several cell types
  2. dataset_02 : a blood-centric dataset
  3. dataset_03 : a neural-centric dataset

Each dataset has an expression matrix ("..expX..") and corresponding metadata ("..sampTab.."). Additionally, a sample GRN and list of TFs are included:

  1. "mousePKN..."
  2. "mouseTFs..."

Real Single-Cell Data

These are 2 real curated single-cell datasets:

  1. dataset01 : a broad dataset spanning several cell types
  2. dataset02 : a neural-centric dataset

Each dataset has an expression matris ("..expX.."), corresponding metadata ("..sampTab.."), and list of TFs ("..TFs.."). Additionally each dataset has reconstructed GRNs at different edge densities.

Real Gold Standards

Raw, individual gold standards for real data can be found here. These are labeled as "source_target_type.csv". Source refers to the starting cell type, target refers to the target cell type, type refers to if it is a list of positives (reported in a cocktail) or negatives (tested, and not in any cocktails).

Importantly, TF names in these lists need to be harmonized with any datasets prior to assessment, such that naming systems are in agreement.

Synthetic Prediction Simulation

To assess performance on synthetic datasets, we can directly simulate the results. This requires Dyngen and Tidyverse. Installation instructions can be found: https://dyngen.dynverse.org/

Here is an example of how to simulate the overexpression of a set of TFs.

Setup and Load in data

Using net02 as an example (see above for the data), load in the common and WT models.

require(dyngen)
require(tidyverse)

net02_model_common<-utils_loadObject("net02_model_common_20220317.rda")
net02_model_wt<-utils_loadObject("net02_model_wt_20220317.rda")
Simulate overexpression

Simulate overexpression of "B3_TF1" and "C3_TF1" for the sEndC-->sEndD transition. Three steps are involved:

# Set the perturbation type
model_oe<-set_simulation_type_overexpression(net02_model_common,c("B3_TF1","C3_TF1"))

# Set the starting state
model_oe<-set_simulation_start_state(model_oe,net02_model_wt,"sEndC")

# Generate the simulation using Dyngen
model_oe<-generate_cells(model_oe)

# plot
plot_gold_mappings(model_oe,do_facet=FALSE)
plot_gold_simulations(model_oe)

The simulations looks like:

Simulate a dynamic, step-wise overexpression strategy

Simulate overexpression of a series of TF sets for the sA-->sEndD transition.

# This will require assertthat
require(assertthat)

# Set up the reporter for the intermediate (or multiple intermediate states)
# Here our reporter for the intermediate state is positive expression of B1_TF1
reporters<-c(1)
names(reporters)<-c("B1_TF1")
reporters<-list(epoch2..epoch2=reporters)


# Set up the perturbation
to_perturb<-list(epoch1..epoch1=c("A4_TF1","A2_TF1"),epoch2..epoch2=c("B3_TF1","B8_TF1","D2_TF1"))


# Simulate
model_oe<-ds_simulate_steps2(net02_model_common,net02_model_wt,"sA",to_perturb,reporters,num_simulations=100L)

# plot
plot_gold_mappings(model_oe,do_facet=FALSE)
plot_gold_simulations(model_oe)

The simulations looks like:

About

Computational tools for cell fate engineering

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages