MSKCC/CTI/CTAinn

Introduction

MSKCC-CTI/CTAinn is a Comprehensive TAPS Analysis pipeline Nextflow/nf-core borne designed to be highly flexible and can be run on a wide range of computing environments, from a single laptop, to a computing cluster or cloud computing environments.

CTAinn processes TAPS (TET-assisted pyridine borane sequencing) data to analyze DNA methylation patterns. The pipeline takes raw FASTQ files from TAPS sequencing experiments and performs quality control, alignment, methylation calling, and comprehensive downstream analysis. It generates various outputs including quality metrics, methylation reports, and visualization files that enable researchers to understand DNA methylation patterns in their samples.

TAPS stands for TET-assisted pyridine borane sequencing.

Pipeline Overview

The pipeline includes the following main steps:

Quality Control (FastQC)
- Comprehensive quality assessment of raw sequencing reads
Concatenate FASTQs (cat)
- Combines multiple FASTQ files for the same sample
Mapping 3.1. Mapping with (BWA-Meth)
- Alignment of bisulfite-converted reads to reference genome OR 3.2. Mapping with (BWA mem2)
- The next version of bwa-mem
Mark Duplicates (GATK4-MarkDuplicates)
- Identification and marking of PCR duplicates
Methylation Calling 5.1 Methylation Calling with (rasTair)
- Extraction of methylation calls from aligned reads 5.2 Methylation Calling with (asTair)
- Extraction of methylation calls from aligned reads
MultiQC (MultiQC)
- Aggregation of all QC reports into a single dashboard

Usage

Note

If you are new to Nextflow, please refer to this page on how to set-up Nextflow.

Input Preparation

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
TREATMENT_REP1,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz

The samplesheet requires the following columns:

sample: Unique sample identifier
fastq_1: Path to forward reads (R1)
fastq_2: Path to reverse reads (R2)

Running the Pipeline

You can run the pipeline using:

nextflow run </path/to/>/ctainn \
   -profile <docker/singularity> \
   --input samplesheet.csv \
   --genome GRCh38 \
   --outdir results

Key Parameters

--input: Path to samplesheet CSV file
--outdir: Output directory path
--email: Email address for completion notification
--max_memory: Maximum memory to use (default: '128.GB')
--max_cpus: Maximum CPUs to use (default: 12)

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

mskcc-cti/ctainn was originally written by [email protected].

We thank the following people for their extensive assistance in the development of this pipeline:

The nf-core community - Framework and best practices

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For support, please:

Read the pipeline documentation
Check existing issues
Create a new issue with a detailed description of your problem

Citations

If you use mskcmoinn/ctainn for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX

Key tools used in this pipeline:

BWA-Meth

Pedersen BS, et al. Fast and accurate alignment of long bisulfite-seq reads. arXiv:1401.1129, 2014.

TO-DO: Complete the citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSKCC/CTI/CTAinn

Introduction

Pipeline Overview

Usage

Input Preparation

Running the Pipeline

Key Parameters

Credits

Contributions and Support

Citations

About

Releases

Packages

Languages

License

jblancoheredia/CTAinn

Folders and files

Latest commit

History

Repository files navigation

MSKCC/CTI/CTAinn

Introduction

Pipeline Overview

Usage

Input Preparation

Running the Pipeline

Key Parameters

Credits

Contributions and Support

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages