This is the project repository accompanying the phase 3 main ("flagship") publication of the Human Genome Structural Variation Consortium (HGSVC).
The HGSVC deposits all its data on the public FTP site of the Internation Genome Sample Resources (IGSR).
Though not accessioned, the IGSR storage location is nevertheless permanent. The working/
folder captures all data entities, even if they have
never been formally published. The release/
folder only captures essential production-quality datasets such as variant callsets.
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/working/
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/
Please carefully read the data reuse statement if you want to use any of the HGSVC resources deposited on the IGSR FTP:
HGSVC data reuse statement (markdown text)
For primary data such as sequencing data generated by the HGSVC project partners, please refer to the Data availability
statement
in the main manuscript and to the respective sections in the Supplemental Material to find the bioproject and data file accessions.
Any software that had not been published in its own (companion) paper at the time of manuscript writing, or that does not rise to the level of independence required for publication (e.g., just a bunch of scripts or workflows chaining together otherwise published tools) is listed in the following table:
HGSVC3 software resources (TSV table)
For all other tools used in the HGSVC phase 3 paper, please refer to the respective method publication as referenced in the text.
This repository contains two version of the source code:
- a copy (tarball) for all of the above software resources for the sole purpose of documenting a frozen state
- a static copy, i.e. the unpacked tarball of the source code in the respective subfolder named
/plain
- please note: the current (potentially updated) version of the source code is only available in the respective GitHub repository (see the tabular summary or the more verbose listing below)
Please note: For all inquiries about the respective software or feedback such as issue reports, please get in touch with the person listed as contact or directly proceed to the external repository (see the following listing). Software-specific support requests issued in this repository will NOT be answered.
For your convenience, the following listing provides direct links to the main README of the external repositories (tabular summary):
- Scripts used for annotating the MHC locus
- contact: Alexander Dilthey
- Jupyter notebooks for various HGSVC analyses, e.g., related to MEIs
- contact: Mark Loftus
- code for building compacted DBGs, used in sample selection
- contact: Tobias Rausch
- code implementing the analyses related to the PanGenie genotyping
- contact: Jana Ebler
- R script implementing the analyses required from sample selection
- contact: Arvis Sulovari
- Scripts/workflow for evaluating genome assemblies
- contact: Youngjun Kwon
- Scripts for customized analyses related to, e.g., data management, assembly evaluation and plotting
- contact: Peter Ebert
- Repository of the L1ME-AID tool for MEI analysis
- contact: Mark Loftus
- Repository of the MELT-LRA tool for MEI analysis
- contact: Scott Devine
- Repository of the PALMER tool for MEI analysis
- contact: Weichen Zhou
- Repository of the PAV tool for assembly-based variant calling
- contact: Peter Audano
- Workflow executing various tools for assembly evaluation
- contact: Peter Ebert
- Workflow for annotating segmental duplications
- contact: Mark Chaisson
- Workflow for producing phased Verkko assemblies
- contact: Peter Ebert
- Workflow for identifying centromere dip regions
- contact: Glennis Logsdon
(under revision)
Logsdon, Ebert, Audano, Loftus et al., "Complex genetic variation in nearly complete human genomes", bioRxiv 2024