README

Pantohap generates a haplotype graph by finding shared haplotypes in multiple genomes.

The steps of the pipeline are present in the pantohap.sh. The steps are:

Compare the genomes to the reference and identify shared variations between them.
1. Done using the get_haplotype_vcf.sh and involves:
  1. Aligning genomes to reference
  2. Variant calling
  3. Get short variants in syntenic regions
  4. Merge and normalise variant calls based on structural properties (e.g. large deletions, genomic rearrangements, etc)
Use the variants identified above to create haplotype graph. For this, the reference genome is divided into 100kbp windows and genomes sharing similar variants in a window are collapsed to form nodes. Done using the util.py -> hapnodesfromvcffile() function.
Get candidate marker kmers from the genome assemblies. Kmers are identified in the genome assemblies and any kmer that is present more than once in any genome is filtered out.
1. Uses sbatch_files/SBATCH_get_unique_kmers_from_assemblies_part1.sh and sbatch_files/SBATCH_get_unique_kmers_from_assemblies_part2.sh files.
For each window, fetch the fasta sequence of the corresponding syntenic region for each of the genomes. Done using util.py -> get_node_query_sequence()
From these fasta files, select kmers that are present only once in fasta and selected as candidate kmers from the assemblies. Done using sbatch_files/SBATCH_get_unique_kmer_per_window.sh
Merge kmers present in sequences from a node and are specific on the node. Done using util.py -> get_unique_kmers_per_node(). This creates the nodekmers*txt files which lists all identified kmer markers for each of the node. These kmers are then used to run EM and threading algorithms.

Name	Name	Last commit message	Last commit date
Latest commit mnshgl0110 Delete CraigDent_get_kmer_counts.sh Feb 27, 2025 be0f3c4 · Feb 27, 2025 History 85 Commits
nfpipe	nfpipe	nextflow initial commit	Oct 30, 2024
sbatch_files	sbatch_files	scripts for Kmer only hap graphs	Oct 30, 2024
.gitignore	.gitignore	ignore nextflow files	Oct 30, 2024
README.md	README.md	Added README.md	Aug 29, 2024
analysis_plots.py	analysis_plots.py	Added plot to compare RB genome vs RB pseudo-contigs	Sep 2, 2024
dotplot.py	dotplot.py	update for tetraploidy	Oct 30, 2024
get_deletion_markers.py	get_deletion_markers.py	Updated to work on all chromosomes	Aug 8, 2024
get_fasta_seq_for_thread.py	get_fasta_seq_for_thread.py	add script to fetch fasta sequence for the selected threads	Aug 29, 2024
get_haplotype_vcf.sh	get_haplotype_vcf.sh	update to work for notds	Oct 30, 2024
get_kmer_graph.sh	get_kmer_graph.sh	kmer-based hap graphs	Oct 30, 2024
get_kmer_stats.py	get_kmer_stats.py	cleanup	Jun 14, 2024
get_unique_kmer_per_window.sh	get_unique_kmer_per_window.sh	update to work for notd	Oct 30, 2024
kmers_sample_specific_analysis.py	kmers_sample_specific_analysis.py	scripts for Kmer only hap graphs	Oct 30, 2024
pantohap.sh	pantohap.sh	update to work for notd	Oct 30, 2024
plot_hap_graph.py	plot_hap_graph.py	Now calculates plot height based on the number of nodes in the visual…	Jun 14, 2024
plot_hap_graph_2.py	plot_hap_graph_2.py	Added modes to generate different visualisation of the graphs	Aug 26, 2024
run_msyd.sh	run_msyd.sh	commit everything from local PC	May 17, 2024
util.py	util.py	update to work for notd	Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

About

Releases

Packages

Languages

schneebergerlab/pantohap

Folders and files

Latest commit

History

Repository files navigation

README

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages