Skip to content

CTAinn is a flexible nf-core·borne pipeline for analyzing cfDNA TAPS-methylated samples using GRCh38 reference. Built with nf-core tools, it processes methylation data across various computing environments - from laptops to HPC clusters and cloud platforms like AWS.

License

Notifications You must be signed in to change notification settings

jblancoheredia/CTAinn

Repository files navigation

MSKCC/CTI/CTAinn

nf-test Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

CTAinn

Introduction

MSKCC-CTI/CTAinn is a Comprehensive TAPS Analysis pipeline Nextflow/nf-core borne designed to be highly flexible and can be run on a wide range of computing environments, from a single laptop, to a computing cluster or cloud computing environments.

CTAinn processes TAPS (TET-assisted pyridine borane sequencing) data to analyze DNA methylation patterns. The pipeline takes raw FASTQ files from TAPS sequencing experiments and performs quality control, alignment, methylation calling, and comprehensive downstream analysis. It generates various outputs including quality metrics, methylation reports, and visualization files that enable researchers to understand DNA methylation patterns in their samples.

TAPS stands for TET-assisted pyridine borane sequencing.

Pipeline Overview

Pipeline Steps

The pipeline includes the following main steps:

  1. Quality Control (FastQC)
    • Comprehensive quality assessment of raw sequencing reads
  2. Concatenate FASTQs (cat)
    • Combines multiple FASTQ files for the same sample
  3. Mapping 3.1. Mapping with (BWA-Meth)
    • Alignment of bisulfite-converted reads to reference genome OR 3.2. Mapping with (BWA mem2)
    • The next version of bwa-mem
  4. Mark Duplicates (GATK4-MarkDuplicates)
    • Identification and marking of PCR duplicates
  5. Methylation Calling 5.1 Methylation Calling with (rasTair)
    • Extraction of methylation calls from aligned reads 5.2 Methylation Calling with (asTair)
    • Extraction of methylation calls from aligned reads
  6. MultiQC (MultiQC)
    • Aggregation of all QC reports into a single dashboard

Usage

Note

If you are new to Nextflow, please refer to this page on how to set-up Nextflow.

Input Preparation

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
TREATMENT_REP1,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz

The samplesheet requires the following columns:

  • sample: Unique sample identifier
  • fastq_1: Path to forward reads (R1)
  • fastq_2: Path to reverse reads (R2)

Running the Pipeline

You can run the pipeline using:

nextflow run </path/to/>/ctainn \
   -profile <docker/singularity> \
   --input samplesheet.csv \
   --genome GRCh38 \
   --outdir results

Key Parameters

  • --input: Path to samplesheet CSV file
  • --outdir: Output directory path
  • --email: Email address for completion notification
  • --max_memory: Maximum memory to use (default: '128.GB')
  • --max_cpus: Maximum CPUs to use (default: 12)

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

mskcc-cti/ctainn was originally written by [email protected].

We thank the following people for their extensive assistance in the development of this pipeline:

  • The nf-core community - Framework and best practices

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For support, please:

  1. Read the pipeline documentation
  2. Check existing issues
  3. Create a new issue with a detailed description of your problem

Citations

If you use mskcmoinn/ctainn for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX

Key tools used in this pipeline:

  • BWA-Meth

    Pedersen BS, et al. Fast and accurate alignment of long bisulfite-seq reads. arXiv:1401.1129, 2014.

TO-DO: Complete the citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

CTAinn is a flexible nf-core·borne pipeline for analyzing cfDNA TAPS-methylated samples using GRCh38 reference. Built with nf-core tools, it processes methylation data across various computing environments - from laptops to HPC clusters and cloud platforms like AWS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published