You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this repository we present *CyGNAL*, a pipeline for analysing mass cytometry data similar to that used in our *Nature Methods* paper: [Cell-type-specific signaling networks in heterocellular organoids](https://www.nature.com/articles/s41592-020-0737-8). With code in both Python and R, CyGNAL assumes some preliminary and inter-step processing through the platform [Cytobank](https://cytobank.org/) (although the user could in theory use any other solution for this and the gating steps).
6
+
In this repository we present *CyGNAL*, a pipeline for analysing mass cytometry
7
+
data similar to that used in our *Nature Methods* paper: [Cell-type-specific signaling networks in heterocellular organoids](https://www.nature.com/articles/s41592-020-0737-8).
8
+
With code in both Python and R, CyGNAL assumes some preliminary and inter-step
9
+
processing through the platform [Cytobank](https://cytobank.org/) (although the
10
+
user could use any other solution for this and the gating steps).
7
11
8
12
Overview of CyGNAL (dashed blue line) within a standard mass cytometry analysis:
9
13
![alt text][Overview]
10
14
11
15
[Overview]: https://github.com/TAPE-Lab/CyGNAL/blob/master/figs/flowchart_v1.2.png"Overview of CyGNAL"
12
16
13
-
##Using CyGNAL
17
+
### Table of contents
14
18
15
-
CyGNAL is distributed as a multilevel directory. The 'code' folder contains the main steps, with other utility scripts found in 'code/utils/'. Input data should be added to 'Raw_Data' for pre-processing and processed datasets are stored in 'Preprocessed_Data'.
16
-
Input and output directories for the analysis and visualisation steps are found in the 'Analysis' directory.
Raw data contains sample dataset files. Pipeline can take in both FCS and .txt files (as tab-separated dataframes).
21
-
22
-
*NOTE*: The toy dataset used in this tutorial is a down-sampled version (5,000 cells per time point, EpCAM/Pan-CK gated) of the small intestinal organoid time-course experiment described in Figure 4 of our [paper](https://www.nature.com/articles/s41592-020-0737-8). The full dataset is available through [Cytobank Community](https://community.cytobank.org/cytobank/experiments/81059). The users will need to register a free Cytobank Community account to access the project and are encouraged to clone the experiments and explore the data in further details.
23
-
24
-
### A Brief Step-by-Step Tutorial
25
-
26
-
Brief tutotorial to run all main steps in CyGNAL with a sequential order.
27
-
All console commands given assume the user is in the tool's root directory (.../CyGNAL/) and moves the relevant data from the ouput folder of the previous step to the input of the current.
28
-
<!-- (Refer to the Nature Protocols paper for more in-depth instructions) -->
29
26
30
-
0.**(SETUP):** Clone (or download) the repository and ensure you have all necessary software and dependencies.
31
-
* We strongly encourage using [conda](https://docs.conda.io/en/latest/miniconda.html) to setup an environment with `conda create -f conda_env.yml`.
27
+
<!-- ### WIP section -> Checklist of README contents
32
28
33
-
1.**Pre-process:** Copy all the data files to the 'Raw_Data' folder and run `1-data_preprocess.py`. The output files with their antibody panel processed (i.e. measured channels decluttered, empty channels deleted, cell-index assigned) will be saved in the 'Preprocessed_Data' folder, together with a *'panel_markers.csv'* file listing all the markers measured in the given experiment.
34
-
*`python 1-data_preprocess.py`
29
+
* System requirements:
30
+
* Dependencies
31
+
* OS (version?) support
32
+
* Installation guide:
33
+
* Instructions
34
+
* Typical install time
35
+
* Demo:
36
+
* Instructions to run
37
+
* Expected output
38
+
* Expected run-time
39
+
* Instructions for use (regarding data in Nat. Protocols)
40
+
* How to run software -->
35
41
36
-
*Optional (if exporting .txt datasets from Cytobank):* Go to the working illustration page (Illustrations - My working illustration), highlight the population(s) of interest, and export events as untransformed text files (Actions - Export - Export events, with *'Include header with FCS filename'* unchecked).
42
+
## 1. System requirements
37
43
38
-
*Note:* This step is essential for getting the dataset compatible with downstream analysis and has to be performed as the first step in our workflow.
44
+
CyGNAL has been tested on both macOS (from Catalina onwards) and Debian-based
45
+
Linux distributions (including Ubuntu on [WSL](https://github.com/Microsoft/WSL)).
39
46
40
-
2.**UMAP:** Move the processed data file(s) and panel_marker.csv to 'Analysis/UMAP_input'. Edit *'panel_markers.csv'* to set all the markers used for UMAP analysis from 'N' to 'Y'. Run `2-umap.py`, and the output files will be saved within the 'Analysis/UMAP_output' folder. The markers and the indices of the cells used in the analysis will also be saved in the new folder.
41
-
*`python 2-umap.py`
42
-
43
-
*Note:* When there is more than one data file used as input of the analysis, each data file can be downsampled to the lowest number of the input (i.e. 'equal' sampling) and concatenated prior to UMAP calculation. After the calculation is complete, the concatenated dataset as well as each individual condition are saved with their UMAP coordinates attached.
47
+
### Dependencies
44
48
45
-
3.**EMD:** To perform EMD calculation (using the tools available in the [scprep](https://github.com/KrishnaswamyLab/scprep) library), copy the input data files to 'Analysis/EMD_input'. Run `3-emd.py` and follow the instructions. By default, the denominator of the EMD calculation will be the concatenation of all the input data files, but the user is given the option to provide a specific denominator data file. While EMD scores of all channels can be calculated by default, by default the user should place the *'panel_markers.csv'* in the input folder to specifiy which marker are to be used. The calculated EMD scores will be saved in 'Analysis/EMD_output', within the 'EMD_arc_no_norm' column in the saved file.
46
-
*`python 3-emd.py`
47
-
48
-
4.**DREMI:** To perform DREMI calculation (using the tools available in the [scprep](https://github.com/KrishnaswamyLab/scprep) library) copy the input data files to 'Analysis/DREMI_input'. Run `4-dremi.py` and follow the instructions. As with EMD, DREMI scores of all permutations of marker combinations can be calculated, but we suggest specifying the markers of interest by modifying the *'panel_markers.csv'* file. The calculated DREMI scores will be saved in 'Analysis/DREMI_output'.
49
-
*`python 4-dremi.py`
50
-
51
-
*Optional:* The user is given the option to save the density-resampled plots for data inspection and to perform a standard deviation-based outlier removal step prior to DREMI calculation.
52
-
53
-
5.**Heatmap:** To visualise EMD/DREMI scores in heatmaps, copy the EMD/DREMI calculation outputs to the 'Analysis/Vis_Heatmap' folder. Run `5v1-htmp.py` and follow the instructions in the GUI. The script accepts only one EMD data file and one DREMI data file (with 'EMD' and 'DREMI' in their file names respectively) to be visualised.
54
-
*`python 5v1-htmp.py`
55
-
56
-
6.**Principal component analysis (PCA):** To perform PCA and visualise the results, copy the EMD/DREMI calculation outputs to the 'Analysis/Vis_PCA' folder. Run `5v2-pca.py` and follow the instructions in the GUI.
57
-
*`python 5v2-pca.py`
58
-
59
-
60
-
## Dependencies
61
-
62
-
* Python: Tested with Python v3.6, v3.7, and v3.8. Used in the backbone of the workflow and most computational steps.
49
+
* Python: Tested with Python v3.6, v3.7, and v3.8. Used in the backbone of the
50
+
workflow and most computational steps.
63
51
* `fcsparser`
64
52
* `fcswrite`
65
53
* `numpy`
@@ -71,13 +59,15 @@ All console commands given assume the user is in the tool's root directory (.../
71
59
* `sklearn`
72
60
* `umap-learn`
73
61
74
-
* R: Tested with v3.6.1 < R <= v4. Mostly used for visualisation, but also for computing the PCA.
62
+
* R: Tested with v3.6 < R <= v4.0. Mostly used for visualisation, but also for
63
+
computing the PCA.
75
64
* `ComplexHeatmap`
76
65
* `DT`
77
66
* `factoextra`
78
67
* `FactoMineR`
79
68
* `flowCore`
80
69
* `Ggally`
70
+
* `ggrepel`ma
81
71
* `Hmisc`
82
72
* `MASS`
83
73
* `matrixStats`
@@ -90,13 +80,130 @@ All console commands given assume the user is in the tool's root directory (.../
90
80
* Bourne shell:
91
81
*`Rscript`
92
82
93
-
## Authors
94
83
95
-
The work here is actively being developed by Ferran Cardoso ([@FerranC96](https://github.com/FerranC96)) and Dr. Xiao Qin ([@qinxiao1990](https://github.com/qinxiao1990)).
84
+
## 2. Using CyGNAL
85
+
86
+
CyGNAL is distributed as a set of directories. The 'code' folder contains the
87
+
main steps, with other utility scripts found in 'code/utils/', to be run as `python` scripts.
88
+
Input data should be added to 'Raw_Data' for pre-processing, and processed
89
+
datasets are stored in 'Preprocessed_Data'. Input and output directories for
90
+
the analysis and visualisation steps are found in the 'Analysis' directory.
91
+
92
+
### Input data
93
+
94
+
CyGNAL can take in both FCS and .txt files (as tab-separated dataframes and
95
+
without a header). The 'Raw Data' directory contains sample dataset files.
96
+
97
+
*NOTE*: The toy dataset used in this tutorial is a down-sampled version
98
+
(5,000 cells per time point, EpCAM/Pan-CK gated) of the small intestinal
99
+
organoid time-course experiment described in Figure 4 of our [paper](https://www.nature.com/articles/s41592-020-0737-8).
100
+
The full dataset is available through [Cytobank Community](https://community.cytobank.org/cytobank/experiments/81059).
101
+
The users will need to register a free Cytobank Community account to access
102
+
the project and are encouraged to clone the experiments and explore the data in
103
+
further details.
104
+
105
+
### A brief step-by-step tutorial
106
+
107
+
Brief tutotorial to run all main steps in CyGNAL with a sequential order.
108
+
All console commands given assume the user is in the tool's root directory
109
+
(.../CyGNAL/) and moves the relevant data from the ouput folder of the previous
110
+
step to the input of the current.
111
+
112
+
With the toy datasets present by default in the 'Raw_Data' folder, running the
113
+
full set of steps within CyGNAL should take less than 15 minutes in total.
114
+
Keep in mind however that runtimes will scale with bigger, or multiple, datasets.
115
+
<!-- (Refer to the Nature Protocols paper for more in-depth instructions) -->
116
+
117
+
0.**(SETUP):** Clone (or download) the repository and ensure you have all
118
+
necessary software and dependencies.
119
+
* We strongly encourage using [conda](https://docs.conda.io/en/latest/miniconda.html)
120
+
to setup an environment with `conda create -f conda_env.yml`.
121
+
122
+
1.**Pre-process:** Copy all the data files to the 'Raw_Data' folder and run
123
+
`1-data_preprocess.py`. The output files with their antibody panel processed
will be saved in the 'Preprocessed_Data' folder, together with a *'panel_markers.csv'*
126
+
file listing all the markers measured in the given experiment.
127
+
* `python 1-data_preprocess.py`
128
+
129
+
*Optional (if exporting .txt datasets from Cytobank):* Go to the working
130
+
illustration page (Illustrations - My working illustration), highlight the
131
+
population(s) of interest, and export events as untransformed text files
132
+
(Actions - Export - Export events, with *'Include header with FCS filename'* unchecked).
133
+
134
+
*Note:* This step is essential for getting the dataset compatible with
135
+
downstream analysis and has to be performed as the first step in our workflow.
136
+
137
+
2.**UMAP:** Move the processed data file(s) and panel_marker.csv to 'Analysis/UMAP_input'.
138
+
Edit *'panel_markers.csv'* to set all the markers used for UMAP analysis from 'N' to 'Y'.
139
+
Run `2-umap.py`, and the output files will be saved within the 'Analysis/UMAP_output' folder.
140
+
The markers and the indices of the cells used in the analysis will also be saved in the new folder.
141
+
* `python 2-umap.py`
142
+
143
+
*Note:* When there is more than one data file used as input of the analysis,
144
+
each data file can be downsampled to the lowest number of the input
145
+
(i.e. 'equal' sampling) and concatenated prior to UMAP calculation.
146
+
After the calculation is complete, the concatenated dataset as well as each
147
+
individual condition are saved with their UMAP coordinates attached.
148
+
149
+
3.**EMD:** To perform EMD calculation (using the tools available in the
150
+
[scprep](https://github.com/KrishnaswamyLab/scprep) library), copy the input
151
+
data files to 'Analysis/EMD_input'. Run `3-emd.py` and follow the instructions.
152
+
By default, the reference of the EMD calculation will be the concatenation
153
+
of all the input data files, but the user is given the option to provide a
154
+
specific reference data file. While EMD scores of all channels can be
155
+
calculated, the default behaviour requires the user to place the *'panel_markers.csv'*
156
+
in the input folder to specifiy which markers are to be used.
157
+
The calculated EMD scores will be saved in 'Analysis/EMD_output', within the
158
+
'EMD_arc_no_norm' column in the saved file.
159
+
* `python 3-emd.py`
160
+
161
+
4.**DREMI:** To perform DREMI calculation (using the tools available in the
162
+
[scprep](https://github.com/KrishnaswamyLab/scprep) library) copy the input
163
+
data files to 'Analysis/DREMI_input'. Run `4-dremi.py` and follow the
164
+
instructions. As with EMD, DREMI scores of all permutations of marker
165
+
combinations can be calculated, but we suggest specifying the markers of
166
+
interest by modifying the *'panel_markers.csv'* file.
167
+
The calculated DREMI scores will be saved in 'Analysis/DREMI_output'.
168
+
* `python 4-dremi.py`
169
+
170
+
*Optional:* The user is given the option to save the density-resampled
171
+
plots for data inspection and to perform a standard deviation-based outlier
172
+
removal step prior to DREMI calculation.
173
+
174
+
5.**Heatmap:** To visualise EMD/DREMI scores in heatmaps, copy the EMD/DREMI
175
+
calculation outputs to the 'Analysis/Vis_Heatmap' folder.
176
+
Run `5v1-htmp.py` and follow the instructions in the GUI. The script accepts
177
+
only one EMD data file and one DREMI data file (with 'EMD' and 'DREMI' in their
178
+
file names respectively) to be visualised.
179
+
* `python 5v1-htmp.py`
180
+
181
+
6.**Principal component analysis (PCA):** To perform PCA and visualise the
182
+
results, copy the EMD/DREMI calculation outputs to the 'Analysis/Vis_PCA' folder.
183
+
Run `5v2-pca.py` and follow the instructions in the GUI.
184
+
* `python 5v2-pca.py`
185
+
186
+
187
+
## 3. About us
188
+
189
+
### Authors
190
+
191
+
The work here is actively being developed by
192
+
Ferran Cardoso ([@FerranC96](https://github.com/FerranC96)) and
193
+
Dr. Xiao Qin ([@qinxiao1990](https://github.com/qinxiao1990)).
96
194
Based also on original work by Pelagia Kyriakidou.
97
195
98
-
We acknowledge the work of all third-parties whose packages are used in CyGNAL.
196
+
### Support
99
197
100
-
## About the group
198
+
For any queries or issues regarding CyGNAL please check the
199
+
[Issues](https://github.com/TAPE-Lab/CyGNAL/issues) section in this repository.
101
200
102
-
Repository of the [Cell Communication Lab](http://tape-lab.com/) at UCL's Cancer Institute. The Cell Communication Lab studies how oncogenic mutations communicate with stromal and immune cells in the colorectal cancer (CRC) tumour microenvironment (TME). By understanding how mutations regulate all cell types within a tumour, we aim to uncover novel approaches to treat cancer.
201
+
### The group
202
+
203
+
Repository of the [Cell Communication Lab](http://tape-lab.com/) at UCL's Cancer Institute.
204
+
The Cell Communication Lab studies how oncogenic mutations communicate with
205
+
stromal and immune cells in the colorectal cancer (CRC) tumour microenvironment (TME).
206
+
By understanding how mutations regulate all cell types within a tumour,
207
+
we aim to uncover novel approaches to treat cancer.
208
+
209
+
We acknowledge the work of all third-parties whose packages are used in CyGNAL.
0 commit comments