GitHub - arpin-g/mt-DNA-evo: Mitochondrial DNA analysis

The project aims to analyze human mitochondrial DNA to explore genetic variation of Armenian population and differencies between Armenian and neighbouring populations.

Team:

Arpine Griogoryan (arpigrigoryan1@gmail.com)
Syuzi Matevosyan (syuzi.matevosyan1802@gmail.com)

Workflow:

Preprocessing

We downoladed fasta files of each sample with script scr/preprocessing/download_fasta.py Data is stored in mt_fastas/samples

Alignment and Variant Calling

For alignment we used BWA (Version: 0.7.17-r1188) with scr/preprocessing/alignment.sh script and rCRS as reference genome (NC_012920). Script performs alignment, stores in /sam, then variant calling for each sample. For variant calling we used BCFtools call (Version: 1.16) and stores VCFs in vcfs/ directory.

Variant Annotation

Variant annotation was done using snpEff (Version: 5.1d). At first we built a library for human mitochondria, using gff3 file obtained from GenBank. After proceeded with script scr/annotation.sh . Stored annotated VCF files in /anno directory.

Defining Haplogroups

Haplogroups were defined using HaploGrep (Version 3.0) web server (https://haplogrep.i-med.ac.at/). Downloaded results as zip (haplogroups.zip)

Data #1: Paper: Derenko et al. (2019) Insights into matrilineal genetic structure, diferentiation and ancestry of Armenians based on complete mitogenome data https://doi.org/10.1007/s00438-019-01596-2

Samples: Complete human mitogenomes, GenBank accession numbers: MK491355–MK491495

Data #2 Paper: Derenko, M. et al. (2013). Complete mitochondrial DNA diversity in Iranians. (355 Iranian,Kurd,Russian,Qashqai mtDNAs) https://doi.org/10.1371/journal.pone.0080673

Data #3 Paper: Schoenberg,A., Theunert,C., Li,M., Stoneking,M. and Nasidze,I. (2011) High-throughput sequencing of complete human mtDNA genomes from the Caucasus and West Asia: high diversity and demographic inferences. doi: 10.1038/ejhg.2011.62

Data #4 Paper: Ashot Margaryan, Miroslava Derenko (2017) Eight Millennia of Matrilineal Genetic Continuity in the South Caucasus http://dx.doi.org/10.1016/j.cub.2017.05.087

Building a phylogenetic tree

For building a phylogenetic tree we decided to use coding sequences. For that we had to transform gff3 genome annotation file from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/NC_012920.1/) to BED file using Bedops gff2bed (https://github.com/bedops/bedops), extracting only protein-coding regions. Further to extract protein-coding sequences from samples we used Bedtools. Then we combined sequences of each protein from all samples, performed multiple sequence alignment, and build a phylogenetic tree based on that.
Script for extracting coding regions: scr/coding_regions_test.sh Script for building a phylogenetic tree based on each protein inside the populations: scr/building_phylo_tree.sh Script for building a phylogenetic tree based on proteins across the populations: scr/prot_across_samples.sh Script for building a phylogenetic tree based on whole genome data across the: scr/phylo_tree_whole_genome.sh

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scr		scr
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages