Bash script that uses GNU Parallel and STAR to map paired-end fastq.gz files to a reference transcriptome using STAR. The default parameters are defined for the Drosophila melanogaster dm6-based r6.62 transcriptome and may be changed for other species.
First, install GNU Parallel and STAR using mamba:
mamba create -n star_env -c bioconda -c conda-forge star parallel
conda activate star_env
Then, clone this repository:
git clone https://github.com/ccarloscr/parastar.git
This script uses a reference genome to map fastq files, for which it needs the fasta and the gtf files of the mapped genome.
Use the code below to download the fasta file of the dm6 genome from UCSC:
mkdir -p parastar/Genomes/dm6
cd parastar/Genomes/dm6
wget http://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/dm6.fa.gz
gunzip dm6.fa.gz
Use the code below to download the latest version of the dm6 gtf file (20/02/2025):
wget http://ftp.flybase.org/genomes/dmel/current/gtf/dmel-all-r6.62.gtf.gz
gunzip dmel-all-r6.62.gtf.gz
Compressed fastq.gz input files should be located in a directory named fastq_files, within the parastar repository:
mkdir -p parastar/fastq_files
The name of fastq.gz files should contain the sample name followed by _R1_001.fastq.gz. If not, either change the file names or change parastar.sh line 29 accordingly.
Read length is set at 50 bp. If not, change parastar-sh line 30 accordingly.