bulk-rnaseq-nf is a bioinformatics pipeline that can be used to analyse RNA sequencing data. It takes a samplesheet and FASTQ files as input, performs lane concatenation, quality control (QC), trimming, alignment, assembly, quantification, and prepares data for input into packages (e.g. DESeq2) for differential expression analysis.
- Lane concatenation for samples sequences on multiple lanes
- Adapter trimming, and read QC (
Trim Galore!
) HiSAT2
index generation if not readily availableHiSAT2
alignment- Sort and index alignments (
SAMtools
) - Transcript assembly and quantification (
StringTie
) - Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (
MultiQC
,R
)
Each directory and file is structured to facilitate the processing pipeline.
conf/
: Configuration files related to the project.modules/
: Contains sub-modules, each serving specific roles like preprocessing, alignment, and transcript assembly.preprocess/
: Preprocessing scripts, such as concatenating and trimming fastqs.align/
: Contains scripts for indexing and alignment using HISAT2.transcript_assembly/
: Scripts for transcript assembly and quantification using StringTie.qc
: Quality control
workflows/
: Main pipeline scripts.bin/
: Directory for helper scripts.params.yaml
: Configuration file specifying input parameters.nextflow.config
: Pipeline-wide configuration settings.
nextflow run workflows/main.nf -params-files params.yaml