Skip to content

hgsvc/phase3-main-pub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HGSVC phase 3 repository

This is the project repository accompanying the phase 3 main ("flagship") publication of the Human Genome Structural Variation Consortium (HGSVC).

DOI

Resources

Data / IGSR FTP

The HGSVC deposits all its data on the public FTP site of the Internation Genome Sample Resources (IGSR). Though not accessioned, the IGSR storage location is nevertheless permanent. The working/ folder captures all data entities, even if they have never been formally published. The release/ folder only captures essential production-quality datasets such as variant callsets.

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/working/

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/

Please carefully read the data reuse statement if you want to use any of the HGSVC resources deposited on the IGSR FTP:

HGSVC data reuse statement (markdown text)

For primary data such as sequencing data generated by the HGSVC project partners, please refer to the Data availability statement in the main manuscript and to the respective sections in the Supplemental Material to find the bioproject and data file accessions.

Software

Any software that had not been published in its own (companion) paper at the time of manuscript writing, or that does not rise to the level of independence required for publication (e.g., just a bunch of scripts or workflows chaining together otherwise published tools) is listed in the following table:

HGSVC3 software resources (TSV table)

For all other tools used in the HGSVC phase 3 paper, please refer to the respective method publication as referenced in the text.

This repository contains two version of the source code:

  1. a copy (tarball) for all of the above software resources for the sole purpose of documenting a frozen state
  2. a static copy, i.e. the unpacked tarball of the source code in the respective subfolder named /plain
    • please note: the current (potentially updated) version of the source code is only available in the respective GitHub repository (see the tabular summary or the more verbose listing below)

Please note: For all inquiries about the respective software or feedback such as issue reports, please get in touch with the person listed as contact or directly proceed to the external repository (see the following listing). Software-specific support requests issued in this repository will NOT be answered.

Software: direct links to external repositories

For your convenience, the following listing provides direct links to the main README of the external repositories (tabular summary):

  1. Scripts used for annotating the MHC locus
    • contact: Alexander Dilthey
  2. Jupyter notebooks for various HGSVC analyses, e.g., related to MEIs
    • contact: Mark Loftus
  3. code for building compacted DBGs, used in sample selection
    • contact: Tobias Rausch
  4. code implementing the analyses related to the PanGenie genotyping
    • contact: Jana Ebler
  5. R script implementing the analyses required from sample selection
    • contact: Arvis Sulovari
  6. Scripts/workflow for evaluating genome assemblies
    • contact: Youngjun Kwon
  7. Scripts for customized analyses related to, e.g., data management, assembly evaluation and plotting
    • contact: Peter Ebert
  8. Repository of the L1ME-AID tool for MEI analysis
    • contact: Mark Loftus
  9. Repository of the MELT-LRA tool for MEI analysis
    • contact: Scott Devine
  10. Repository of the PALMER tool for MEI analysis
    • contact: Weichen Zhou
  11. Repository of the PAV tool for assembly-based variant calling
    • contact: Peter Audano
  12. Workflow executing various tools for assembly evaluation
    • contact: Peter Ebert
  13. Workflow for annotating segmental duplications
    • contact: Mark Chaisson
  14. Workflow for producing phased Verkko assemblies
    • contact: Peter Ebert
  15. Workflow for identifying centromere dip regions
    • contact: Glennis Logsdon

Citation sources

Published article

(under revision)

Preprint

Logsdon, Ebert, Audano, Loftus et al., "Complex genetic variation in nearly complete human genomes", bioRxiv 2024

Link: bioRxiv preprint

About

Project repository for the phase 3 main publication of the HGSVC

Resources

Stars

Watchers

Forks

Packages

No packages published