Skip to content
Emily Delorean edited this page Sep 5, 2024 · 16 revisions

Welcome to the Pepper Trio-binning wiki

Here's the workflow from start to finish of how we assembled the Capsicum annuum HDA149 and HDA330 genomes in our 2023 study. This method uses trio-binning to fully resolve the haplotypes of each cultivar. A trio is made of long read sequencing on an individual and short read sequencing on its two parents. In our experiment, double haploid cultivar HDA149 is the maternal parent, double haploid cultivar HDA330 is the paternal parent, and the heterozygous individual is a single F1 plant of the HDA149 x HDA330. The end result was two fully phased genome assemblies corresponding to HDA149 and HDA330.

image

The first step is processing of the raw sequencing data. We had PacBio HiFi reads of the F1 and Illumina paired-end short reads for HDA149 and HDA330. Our PacBio HiFi reads were cleaned for adapter with HiFiadapterFilt. Our Illumina short reads quality checked and trimmed with fastp.

Data

The first step is processing of the raw sequencing data. We had PacBio HiFi reads of the F1 and Illumina paired-end short reads for HDA149 and HDA330. Our PacBio HiFi reads were cleaned for adapter with HiFiadapterFilt. Our Illumina short reads quality checked and trimmed with fastp.

We also need to filter adapters from the HiFi reads. Some runs of HiFi sequencing has no adapter contamination, but it's best practice to check and filter. This data does have some contamination. Filtering code here

Genome Assembly

Trio-binning genome assembly was performed with TrioCanu and Hifiasm.

Assembly binned reads with Hifiasm


Scaffolding

Contig level assemblies were scaffolded using this workflow. image


Assembly quality metrics

Quality of the final assemblies was measured with [Assembly statistics](https://github.com/USDA-ARS-GBRU/Pepper_TrioBinning/wiki/Calculate-Assembly-Statistics-(and-convert-.gfa-to-.fasta), LTR Assembly Index and Busco.