Skip to content
/ EBP Public

Earth Biogenome Project (pilot project repo)

Notifications You must be signed in to change notification settings

HPCBio/EBP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 

Repository files navigation

Earth Biogenome Project (EBP) Insect Genome Assembly

The Illinois Innovation Network and the Discovery Partners Institute funded a pilot project to assemble the genomes of agriculturally relevant insects in Illinois for which little or no genomic data are available. Hiqh-quality DNA was isolated using protocols optimized for small, difficult samples. The pilot project Implemented the novel use of Tell-Seq linked-read libraries for the dual purpose of genome size estimation and linked-read scaffolding. Genomes were assembled using PacBio HiFi reads, Tell-Seq reads, and Dovetail Omni-C reads for chromosome-range scaffolding.

Eight high-quality genomes were assembled from non-model organisms with contig N50 >1Mb and scaffold N50 >5Mb, including the second-only soon-to-be public genome for the order Neuroptera. Species were confidently identified using the mitochondrial genomes assembled from the HiFi reads. Potential endosymbionts and pathogens were identified as well as novel prey information from predator species. Genomes were annotated using the BRAKER2 pipeline, generating a rich set of novel data to mine

The Workflow







. .

Denovo genome assembly using HiFi reads

These are the steps:

  1. Generate raw assembly with hifiasm

  2. Purge duplicate contigs

  3. Scaffold using TellSeq reads

  4. Scaffold using Omni-C reads

  5. Fill gaps

  6. Mask repeats and low complexity regions

  7. Predict gene and protein function

  8. Identify and annotate mitochondrial DNA

  9. Assess genome completeness w Merqury

  10. Identify contaminants and artifacts in genome

Denovo genome assembly using CLR reads

  1. Generate raw assembly with Redbean

  2. Base-correct assembly with Arrow

  3. Purge duplicate contigs

  4. Pilon polishing

  5. Scaffold using TellSeq reads

  6. Scaffold using Omni-C reads

  7. Fill gaps

  8. Mask repeats and low complexity regions

  9. Predict gene and protein function

  10. FreeBayes polishing

  11. Identify and annotate mitochondrial DNA

  12. Assess genome completeness w Merqury

  13. Identify contaminants and artifacts in genome

Our sponsors:

To learn more about the Earth Biogenome Project, please use this link: https://www.earthbiogenome.org/

About

Earth Biogenome Project (pilot project repo)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •