Skip to content

NeLLi-team/sgtree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Genome Tree (SGTree)

Simple Genome Tree (SGTree) is a computational pipeline for fast and easy construction of phylogenetic trees from a set of user provided genomes and a set of phylogenetic markers, in a taxonomic framework of de-replicated reference genomes. SGTree identifies conserved phylogenetic marker proteins and evaluates additional copies of markers derived from either duplications, horizontal gene transfer or contamination, to build a phylogenetic tree based on the concatenated alignment of selected marker proteins.

⚙️ Setup

Create conda environment and run sgtree

  1. Clone the git repository.
git clone https://github.com/NeLLi-team/sgtree.git
  1. Make sure you have anaconda installed.

  2. Depending on your OS, choose the osx_env.txt for macOS or the linux_env.txt file for Linux.

  3. Next run (where <spec-file> is either linux_env.txt or osx_env.txt):

cd sgtree/
conda create --name sgtree --file <spec-file>
conda activate sgtree
  1. Make sgtree executable:
chmod u+x sgtree.py

🚀 Run SGTree

  1. Run sgtree with the provided set of query genomes and models for testing, user can control the number of CPUs used by the computer, the minimum percentage of models with hits for a genome to be considered as part of the dataset as well as a directory with reference genomes. Genomes from the query genomes directory and the reference genomes directory will be colored red and grey respectively.
# test example
./sgtree.py testgenomes/Chloroflexi hmms/UNI56 --num_cpus 8

# general example
./sgtree.py \
	<genomes_dir> \
	<models_dir> \
	--num_cpus 8 \
	--percent_models 50 \
	--marker_selection yes \
	--aln mafft \
	--ref <reference_genomes_dir> \
	--save_dir <output_dir>

🚨 Important note

Genomes must have the header format as follows:

>IMG2684622718|2685462912
MLCAFAEEEAKIAETVGKVATELKVKKLLSDFATKEGEEHISTYNKIAMTAKAEGYADIEAMLCAFAEEEAKLQKL

where the first field before the pipe contains the genome identifier which matches the filename base, and the second field a unique protein identifier.

📝 Authors and contributors:

Authors Email Date
Ewan Whittaker-Walker [email protected] 05/19/2019
Frederik Schulz [email protected] Since 2019
Juan C. Villada [email protected] Since 2021
Marianne Buscaglia [email protected] Since 2022

About

Simple Genome Tree (SGTree)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages