Skip to content

Assembly of Canu binned HiFi reads with hifiasm

Emily Delorean edited this page Sep 5, 2024 · 1 revision

HDA149 TrioHifi code

hifiasm -o hifiasm_assembly/TrioHifi_HDA149 -t 32 /bins/haplotype/haplotype-HDA149.fasta.gz haplotype-unknown.fasta.gz
  • We set the number of threads to 32 with -t 32
  • We wrote the output to the 'hifiasm_assembly/' directory and gave the files the 'TrioHifi_HDA149' prefix with -o hifiasm_assembly/TrioHifi_HDA149
  • 2 input files (that we generated through TrioCanu:
    /bins/haplotype/haplotype-HDA149.fasta.gz
    /bins/haplotype/haplotype-unknown.fasta.gz

Results

5 assemblies are generated by default

  1. TrioHifi_HDA149.bp.hap1.p_ctg.gfa
  2. TrioHifi_HDA149.bp.hap2.p_ctg.gfa
  3. TrioHifi_HDA149.bp.p_ctg.gfa <- this is the one that advances
  4. TrioHifi_HDA149.bp.p_utg.gfa
  5. TrioHifi_HDA149.bp.r_utg.gfa

HDA330 TrioHifi code

hifiasm -o hifiasm_assembly/TrioHifi_HDA149.bp.p_ctg.gfa -t 32 /bins/haplotype/haplotype-HDA330.fasta.gz haplotype-unknown.fasta.gz

It's the same code as for HDA149, but with the HDA330 binned reads and generating an assemblies named with the 'TrioHifi_HDA330' prefix

Convert the assembly graphs (.gfa) to fasta

awk '/^S/{print ">"$2;print $3}' TrioHifi_HDA149.bp.p_ctg.gfa > TrioHifi_HDA149.bp.p_ctg.fa