From 504df640eba8a6067b3bbfdea59cec5486283627 Mon Sep 17 00:00:00 2001 From: shafin Date: Thu, 8 May 2025 15:20:46 -0700 Subject: [PATCH 1/9] Remove ONT duplex from case-study PiperOrigin-RevId: 756480864 --- docs/README.md | 1 - .../deepvariant-ont-r104-duplex-case-study.md | 169 ------------------ 2 files changed, 170 deletions(-) delete mode 100644 docs/deepvariant-ont-r104-duplex-case-study.md diff --git a/docs/README.md b/docs/README.md index d176134e5..59e09fef2 100644 --- a/docs/README.md +++ b/docs/README.md @@ -9,7 +9,6 @@ * [DeepVariant exome case study](deepvariant-exome-case-study.md) * [DeepVariant PacBio case study](deepvariant-pacbio-model-case-study.md) * [DeepVariant ONT R10.4 simplex case study](deepvariant-ont-r104-simplex-case-study.md) - [DeepVariant ONT R10.4 duplex case study](deepvariant-ont-r104-duplex-case-study.md) * [DeepVariant hybrid (PacBio and Illumina) case study](deepvariant-hybrid-case-study.md) * [DeepVariant Complete Genomics T7 case study](deepvariant-complete-t7-case-study.md) * [DeepVariant Complete Genomics G400 case study](deepvariant-complete-g400-case-study.md) diff --git a/docs/deepvariant-ont-r104-duplex-case-study.md b/docs/deepvariant-ont-r104-duplex-case-study.md deleted file mode 100644 index 372178fb9..000000000 --- a/docs/deepvariant-ont-r104-duplex-case-study.md +++ /dev/null @@ -1,169 +0,0 @@ -# DeepVariant with Oxford Nanopore R10.4.1 Duplex reads - -In this case study, we describe applying DeepVariant to Oxford Nanopore R10.4.1 -duplex reads. Then we assess the quality of the DeepVariant variant calls with -`hap.py`. - -To make it faster to go over this case study, we run only on chromosome 20. - -The dataset used in this case-study has following attributes: - -```bash -Sample: HG002 -Region: Chr20 -Chemistry: ONT R10.4.1 Duplex -Basecaller: Dorado v0.1.1 -Coverage: 80x -``` - -**Model note:** - -* The model is trained with Guppy 6+ "SUP" Simplex and Dorado v0.1.1 Duplex - reads. - -* The model is trained on both Ultra-long and sheared reads with varying read - N50 and coverage. - -## Prepare environment - -In this case-study, we will use [Docker](https://docs.docker.com/get-docker/) to -run DeepVariant for variant calling and -[hap.py](https://github.com/illumina/hap.py) for benchmarking. - -If you want to run on GPU machines, or use `Singularity` instead of `Docker`, -please follow [Quick Start](deepvariant-quick-start.md) documentation. - -### Create input and output directory structures and download inputs - -```bash -BASE="${HOME}/ont-case-study-duplex" - -# Set up input and output directory data -INPUT_DIR="${BASE}/input/data" -OUTPUT_DIR="${BASE}/output" - -## Create local directory structure -mkdir -p "${INPUT_DIR}" -mkdir -p "${OUTPUT_DIR}" - -# Download reference to input directory -FTPDIR=ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids -curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > ${INPUT_DIR}/GRCh38_no_alt_analysis_set.fasta -curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai > ${INPUT_DIR}/GRCh38_no_alt_analysis_set.fasta.fai - -# Download HG002 Duplex chr20 bam file to input directory -HTTPDIR=https://storage.googleapis.com/deepvariant/ont-case-study-testdata -curl ${HTTPDIR}/HG002_R1041_Duplex_all_Dorado_v0.1.1_400bps_pass_2_GRCh38.chr20.bam > ${INPUT_DIR}/HG002_R1041_Duplex_all_Dorado_v0.1.1_400bps_pass_2_GRCh38.chr20.bam -curl ${HTTPDIR}/HG002_R1041_Duplex_all_Dorado_v0.1.1_400bps_pass_2_GRCh38.chr20.bam.bai > ${INPUT_DIR}/HG002_R1041_Duplex_all_Dorado_v0.1.1_400bps_pass_2_GRCh38.chr20.bam.bai - -# Set up input variables -REF="GRCh38_no_alt_analysis_set.fasta" -BAM="HG002_R1041_Duplex_all_Dorado_v0.1.1_400bps_pass_2_GRCh38.chr20.bam" -THREADS=$(nproc) -REGION="chr20" - -# Set up output variable -OUTPUT_VCF="HG002_R1041_Duplex_Dorado_v0.1.1_GRCh38.chr20.output.vcf.gz" -OUTPUT_GVCF="HG002_R1041_Duplex_Dorado_v0.1.1_GRCh38.output.g.vcf.gz" -INTERMEDIATE_DIRECTORY="intermediate_results_dir" - -mkdir -p "${OUTPUT_DIR}/${INTERMEDIATE_DIRECTORY}" -``` - -## Run DeepVariant - -We will run DeepVariant from docker using the `run_deepvariant` script. - -```bash -BIN_VERSION="1.8.0" - -sudo docker run \ - -v "${INPUT_DIR}":"${INPUT_DIR}" \ - -v "${OUTPUT_DIR}":"${OUTPUT_DIR}" \ - google/deepvariant:"${BIN_VERSION}" \ - /opt/deepvariant/bin/run_deepvariant \ - --model_type ONT_R104 \ - --ref "${INPUT_DIR}/${REF}" \ - --reads "${INPUT_DIR}/${BAM}" \ - --output_vcf "${OUTPUT_DIR}/${OUTPUT_VCF}" \ - --output_gvcf "${OUTPUT_DIR}/${OUTPUT_GVCF}" \ - --num_shards "${THREADS}" \ - --regions "${REGION}" \ - --intermediate_results_dir "${OUTPUT_DIR}/${INTERMEDIATE_DIRECTORY}" -``` - -By specifying `--model_type ONT_R104`, you'll be using a model that is best -suited for Oxford Nanopore R10.4.1 chemistry Simplex and Duplex reads. - -NOTE: If you want to run each of the steps separately, add `--dry_run=true` to -the command above to figure out what flags you need in each step. Based on the -different model types, different flags are needed in the `make_examples` step. - -`--intermediate_results_dir` flag is optional. By specifying it, the -intermediate outputs of `make_examples` and `call_variants` stages can be found -in the directory. After the command, you can find these files in the directory: - -``` -call_variants_output.tfrecord.gz -gvcf.tfrecord-?????-of-?????.gz -make_examples.tfrecord-?????-of-?????.gz -``` - -## Benchmark HG002 chr20 output from DeepVariant - -We will use Genome-in-a-Bottle (GIAB) dataset to evaluate the performance of -DeepVariant. - -### Download Genome in a Bottle Benchmarks - -We will benchmark our variant calls against v4.2.1 of the Genome in a Bottle -small variant benchmarks for HG002. - -```bash -FTPDIR=ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38 - -curl ${FTPDIR}/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed > ${INPUT_DIR}/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed -curl ${FTPDIR}/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz > ${INPUT_DIR}/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz -curl ${FTPDIR}/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi > ${INPUT_DIR}/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi - -TRUTH_VCF="HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz" -TRUTH_BED="HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed" -``` - -```bash -sudo docker pull jmcdani20/hap.py:v0.3.12 - -sudo docker run \ - -v "${INPUT_DIR}":"${INPUT_DIR}" \ - -v "${OUTPUT_DIR}":"${OUTPUT_DIR}" \ - -v "${PWD}/happy:/happy" \ - jmcdani20/hap.py:v0.3.12 /opt/hap.py/bin/hap.py \ - "${INPUT_DIR}/${TRUTH_VCF}" \ - "${OUTPUT_DIR}/${OUTPUT_VCF}" \ - -f "${INPUT_DIR}/${TRUTH_BED}" \ - -r "${INPUT_DIR}/${REF}" \ - -o "${OUTPUT_DIR}/hg002.duplex.r104.ont.chr20.happy.output" \ - --engine=vcfeval \ - --pass-only \ - -l "${REGION}" -``` - -Output: - -``` -Benchmarking Summary: -Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 11256 10372 884 21138 697 9801 364 214 0.921464 0.938520 0.463667 0.929914 NaN NaN 1.561710 2.008049 -INDEL PASS 11256 10372 884 21138 697 9801 364 214 0.921464 0.938520 0.463667 0.929914 NaN NaN 1.561710 2.008049 - SNP ALL 71333 71304 29 110055 63 38637 19 22 0.999593 0.999118 0.351070 0.999356 2.314904 1.752724 1.715978 1.562169 - SNP PASS 71333 71304 29 110055 63 38637 19 22 0.999593 0.999118 0.351070 0.999356 2.314904 1.752724 1.715978 1.562169 -``` - -## Acknowledgement - -**For providing analysis results and expertise, we are thankful to:** - -* Karen Miga, Brandy McNulty, Jean Monlong, Benedict Paten from UC Santa Cruz - Genomics Institute, University of California, Santa Cruz, CA. -* Miten Jain from Department of Bioengineering, Department of Physics, - Northeastern University, Boston, MA. From d70da8e48754d851be7eaec2ce8e888518e37424 Mon Sep 17 00:00:00 2001 From: shafin Date: Fri, 9 May 2025 13:23:43 -0700 Subject: [PATCH 2/9] Update metrics page of DV, DS, Pang-DV, DT. PiperOrigin-RevId: 756884446 --- docs/metrics-deeptrio.md | 54 ++++++++++++++++----------------- docs/metrics.md | 50 +++++++++++++++--------------- docs/pangenome-aware-metrics.md | 24 +++++++-------- 3 files changed, 64 insertions(+), 64 deletions(-) diff --git a/docs/metrics-deeptrio.md b/docs/metrics-deeptrio.md index 97e03d451..4abfae6cd 100644 --- a/docs/metrics-deeptrio.md +++ b/docs/metrics-deeptrio.md @@ -21,15 +21,15 @@ Reported runtime is an average of 5 runs. Stage | Wall time (minutes) -------------------------------- | ----------------- -make_examples | 165m54.08s -call_variants: HG002 | 20m18.91s -call_variants: HG003 | 22m15.00s -call_variants: HG004 | 22m5.79s -postprocess_variants (parallel) | 8m10.50s; 8m43.17s; 8m57.01s -vcf_stats_report(optional):HG002 | 6m21.21s -vcf_stats_report(optional):HG003 | 6m26.73s -vcf_stats_report(optional):HG003 | 6m44.50s -total | 251m16.71s (4h11m16.71s) +make_examples | 165m53.25s +call_variants: HG002 | 20m35.09s +call_variants: HG003 | 22m9.17s +call_variants: HG004 | 21m49.53s +postprocess_variants (parallel) | 8m12.84s; 8m48.05s; 8m50.40s +vcf_stats_report(optional):HG002 | 6m23.53s +vcf_stats_report(optional):HG003 | 6m26.04s +vcf_stats_report(optional):HG003 | 6m43.78s +total | 251m13.52s (4h11m13.52s) ### Accuracy @@ -74,15 +74,15 @@ Reported runtime is an average of 5 runs. Stage | Wall time (minutes) -------------------------------- | ------------------- -make_examples | 17m30.42s+193m24.96s -call_variants: HG002 | 30m38.57s -call_variants: HG003 | 37m37.90s -call_variants: HG004 | 37m19.25s -postprocess_variants (parallel) | 7m59.43s; 8m29.13s; 8m34.95s -vcf_stats_report(optional):HG002 | 6m56.97s -vcf_stats_report(optional):HG003 | 7m7.86s -vcf_stats_report(optional):HG003 | 7m25.53s -total | 337m10.65s (5h37m10.65s) +make_examples | 17m8.34s+189m10.68s +call_variants: HG002 | 30m23.02s +call_variants: HG003 | 37m13.83s +call_variants: HG004 | 36m59.66s +postprocess_variants (parallel) | 7m59.91s; 8m23.51s; 8m30.73s +vcf_stats_report(optional):HG002 | 6m52.84s +vcf_stats_report(optional):HG003 | 6m58.93s +vcf_stats_report(optional):HG003 | 7m6.09s +total | 331m0.71s (5h31m0.71s) * See VCF stats report (for all chromosomes) - [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.9.0/PACBIO/HG002.output.visual_report.html) @@ -124,15 +124,15 @@ Reported runtime is an average of 5 runs. Stage | Wall time (minutes) -------------------------------- | -------------- -make_examples | 7m14.46s -call_variants: HG002 | 2m15.37s -call_variants: HG003 | 2m16.46s -call_variants: HG004 | 2m16.73s -postprocess_variants (parallel) | 0m9.33s; 0m9.38s; 0m9.59s -vcf_stats_report(optional):HG002 | 0m5.66s -vcf_stats_report(optional):HG003 | 0m5.81s -vcf_stats_report(optional):HG003 | 0m5.85s -total | 14m23.67s +make_examples | 7m14.00s +call_variants: HG002 | 2m15.50s +call_variants: HG003 | 2m16.04s +call_variants: HG004 | 2m17.15s +postprocess_variants (parallel) | 0m9.25s; 0m9.44s; 0m9.57s +vcf_stats_report(optional):HG002 | 0m5.60s +vcf_stats_report(optional):HG003 | 0m5.69s +vcf_stats_report(optional):HG003 | 0m5.84s +total | 14m23.30s ### Accuracy diff --git a/docs/metrics.md b/docs/metrics.md index af8129456..1ab5c15b8 100644 --- a/docs/metrics.md +++ b/docs/metrics.md @@ -21,11 +21,11 @@ Reported runtime is an average of 5 runs. Stage | Time (minutes) -------------------------------- | ------------------ -make_examples | 45m31.18s -call_variants | 16m29.80s -postprocess_variants (with gVCF) | 6m53.10s -vcf_stats_report (optional) | 5m19.74s (optional) -total | 79m10.34s (1h19m10.34s) +make_examples | 45m13.77s +call_variants | 16m25.61s +postprocess_variants (with gVCF) | 6m51.14s +vcf_stats_report (optional) | 5m16.42s (optional) +total | 78m57.99s (1h18m57.99s) ### Accuracy @@ -48,11 +48,11 @@ Reported runtime is an average of 5 runs. Stage | Time (minutes) -------------------------------- | ----------------- -make_examples | 2m59.55s -call_variants | 0m33.42s -postprocess_variants (with gVCF) | 0m39.81s -vcf_stats_report (optional) | 0m4.99s (optional) -total | 4m54.42s +make_examples | 3m0.54s +call_variants | 0m33.30s +postprocess_variants (with gVCF) | 0m38.91s +vcf_stats_report (optional) | 0m4.97s (optional) +total | 4m45.64s ### Accuracy @@ -88,11 +88,11 @@ Reported runtime is an average of 5 runs. Stage | Time (minutes) -------------------------------- | ------------------- -make_examples | 36m51.62s -call_variants | 11m26.07s -postprocess_variants (with gVCF) | 4m46.91s -vcf_stats_report (optional) | 5m34.06s (optional) -total | 65m47.31s (1h05m47.31s) +make_examples | 36m48.09s +call_variants | 11m33.13s +postprocess_variants (with gVCF) | 4m47.06s +vcf_stats_report (optional) | 5m26.10s (optional) +total | 66m14.44s (1h06m14.44s) ### Accuracy @@ -119,11 +119,11 @@ Reported runtime is an average of 5 runs. Stage | Time (minutes) -------------------------------- | -------------------- -make_examples | 52m59.11s -call_variants | 16m43.75s -postprocess_variants (with gVCF) | 5m58.09s -vcf_stats_report (optional) | 6m28.17s (optional) -total | 87m26.64s (1h27m26.64s) +make_examples | 55m56.13s +call_variants | 17m29.76s +postprocess_variants (with gVCF) | 5m58.82s +vcf_stats_report (optional) | 6m23.70s (optional) +total | 91m6.31s (1h31m6.31s) ### Accuracy @@ -147,11 +147,11 @@ Reported runtime is an average of 5 runs. Stage | Time (minutes) -------------------------------- | ------------------ -make_examples | 61m42.28s -call_variants | 65m45.17s -postprocess_variants (with gVCF) | 3m42.80s -vcf_stats_report (optional) | 5m11.9s (optional) -total | 154m56.26s (2h34m56.26s) +make_examples | 62m2.28s +call_variants | 65m3.32s +postprocess_variants (with gVCF) | 3m43.18s +vcf_stats_report (optional) | 5m6.89s (optional) +total | 154m30.64s (2h34m30.64s) ### Accuracy diff --git a/docs/pangenome-aware-metrics.md b/docs/pangenome-aware-metrics.md index 7b36a9b6c..438e24baa 100644 --- a/docs/pangenome-aware-metrics.md +++ b/docs/pangenome-aware-metrics.md @@ -24,12 +24,12 @@ Reported runtime is an average of 5 runs. Stage | Time (minutes) -------------------------------- | ------------------ -load_gbz_into_shared_memory | 1m7.93s -make_examples | 87m52.14s -call_variants | 162m1.43s -postprocess_variants (with gVCF) | 7m21.50s -vcf_stats_report (optional) | 5m47.01s -total | 272m26.51s (4h32m26.51s) +load_gbz_into_shared_memory | 1m8.16s +make_examples | 88m40.61s +call_variants | 164m8.08s +postprocess_variants (with gVCF) | 7m19.23s +vcf_stats_report (optional) | 1m13.14s +total | 275m15.26s (4h35m15.26s) ### Accuracy @@ -50,12 +50,12 @@ Reported runtime is an average of 5 runs. Stage | Time (minutes) -------------------------------- | ----------------- -load_gbz_into_shared_memory | 1m7.89s -make_examples | 4m50.06s -call_variants | 1m3.99s -postprocess_variants (with gVCF) | 0m40.71s -vcf_stats_report (optional) | 0m5.01s -total | 9m16.39s +load_gbz_into_shared_memory | 1m8.12s +make_examples | 4m57.62s +call_variants | 1m4.49s +postprocess_variants (with gVCF) | 0m38.81s +vcf_stats_report (optional) | 0m5.07s +total | 9m22.64s ### Accuracy From 7c9c2647dd46bd836dc10cf25be66117a37ad27c Mon Sep 17 00:00:00 2001 From: shafin Date: Fri, 9 May 2025 16:58:04 -0700 Subject: [PATCH 3/9] Update all case-studies for 1.9 release PiperOrigin-RevId: 756954891 --- docs/deepvariant-case-study.md | 10 +++++----- docs/deepvariant-complete-g400-case-study.md | 14 +++++++------- docs/deepvariant-complete-t7-case-study.md | 14 +++++++------- docs/deepvariant-exome-case-study.md | 10 +++++----- docs/deepvariant-haploid-support.md | 2 +- docs/deepvariant-hybrid-case-study.md | 10 +++++----- docs/deepvariant-masseq-case-study.md | 2 +- docs/deepvariant-ont-r104-simplex-case-study.md | 10 +++++----- docs/deepvariant-pacbio-model-case-study.md | 14 +++++++------- docs/deepvariant-quick-start.md | 12 ++++++------ docs/deepvariant-xy-calling-case-study.md | 10 +++++----- docs/pangenome-aware-wes-bwa-case-study.md | 10 +++++----- docs/pangenome-aware-wgs-bwa-case-study.md | 10 +++++----- docs/pangenome-aware-wgs-vg-case-study.md | 10 +++++----- 14 files changed, 69 insertions(+), 69 deletions(-) diff --git a/docs/deepvariant-case-study.md b/docs/deepvariant-case-study.md index 4d210a569..dee369e65 100644 --- a/docs/deepvariant-case-study.md +++ b/docs/deepvariant-case-study.md @@ -68,7 +68,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples`, `call_variants`, and mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${PWD}/input":"/input" \ @@ -136,8 +136,8 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10585 43 21064 22 10001 18 3 0.995954 0.998011 0.474791 0.996982 NaN NaN 1.748961 2.319825 -INDEL PASS 10628 10585 43 21064 22 10001 18 3 0.995954 0.998011 0.474791 0.996982 NaN NaN 1.748961 2.319825 - SNP ALL 70166 69918 248 84834 56 14822 13 3 0.996466 0.999200 0.174718 0.997831 2.296566 2.083842 1.883951 1.913523 - SNP PASS 70166 69918 248 84834 56 14822 13 3 0.996466 0.999200 0.174718 0.997831 2.296566 2.083842 1.883951 1.913523 +INDEL ALL 10628 10574 54 20960 19 9913 15 4 0.994919 0.998280 0.472948 0.996597 NaN NaN 1.748961 2.292778 +INDEL PASS 10628 10574 54 20960 19 9913 15 4 0.994919 0.998280 0.472948 0.996597 NaN NaN 1.748961 2.292778 + SNP ALL 70166 69903 263 85639 52 15648 9 1 0.996252 0.999257 0.182720 0.997752 2.296566 2.066404 1.883951 1.841826 + SNP PASS 70166 69903 263 85639 52 15648 9 1 0.996252 0.999257 0.182720 0.997752 2.296566 2.066404 1.883951 1.841826 ``` diff --git a/docs/deepvariant-complete-g400-case-study.md b/docs/deepvariant-complete-g400-case-study.md index 83ed2e40b..0c2300b53 100644 --- a/docs/deepvariant-complete-g400-case-study.md +++ b/docs/deepvariant-complete-g400-case-study.md @@ -54,7 +54,7 @@ On a CPU-only machine: mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${PWD}/input":"/input" \ @@ -105,15 +105,15 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 11256 11131 125 20893 32 9306 26 4 0.988895 0.997238 0.445412 0.993049 NaN NaN 1.561710 2.036244 -INDEL PASS 11256 11131 125 20893 32 9306 26 4 0.988895 0.997238 0.445412 0.993049 NaN NaN 1.561710 2.036244 - SNP ALL 71333 70954 379 85828 50 14776 28 6 0.994687 0.999296 0.172158 0.996986 2.314904 2.095278 1.715978 1.741515 - SNP PASS 71333 70954 379 85828 50 14776 28 6 0.994687 0.999296 0.172158 0.996986 2.314904 2.095278 1.715978 1.741515 +INDEL ALL 11256 11129 127 20905 30 9322 25 4 0.988717 0.997410 0.445922 0.993045 NaN NaN 1.561710 2.053139 +INDEL PASS 11256 11129 127 20905 30 9322 25 4 0.988717 0.997410 0.445922 0.993045 NaN NaN 1.561710 2.053139 + SNP ALL 71333 70954 379 85776 52 14722 28 8 0.994687 0.999268 0.171633 0.996972 2.314904 2.098765 1.715978 1.753260 + SNP PASS 71333 70954 379 85776 52 14722 28 8 0.994687 0.999268 0.171633 0.996972 2.314904 2.098765 1.715978 1.753260 ``` To summarize: | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score | | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- | -| INDEL | 11131 | 125 | 32 | 0.988895 | 0.997238 | 0.993049 | -| SNP | 70954 | 379 | 50 | 0.994687 | 0.999296 | 0.996986 | +| INDEL | 11129 | 127 | 30 | 0.988717 | 0.997410 | 0.993045 | +| SNP | 70954 | 379 | 52 | 0.994687 | 0.999268 | 0.996972 | diff --git a/docs/deepvariant-complete-t7-case-study.md b/docs/deepvariant-complete-t7-case-study.md index 87bd5568d..3856b3220 100644 --- a/docs/deepvariant-complete-t7-case-study.md +++ b/docs/deepvariant-complete-t7-case-study.md @@ -54,7 +54,7 @@ On a CPU-only machine: mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${PWD}/input":"/input" \ @@ -105,15 +105,15 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 9974 9946 28 20994 10 10692 4 5 0.997193 0.999029 0.509288 0.998110 NaN NaN 1.630447 2.128048 -INDEL PASS 9974 9946 28 20994 10 10692 4 5 0.997193 0.999029 0.509288 0.998110 NaN NaN 1.630447 2.128048 - SNP ALL 69175 68877 298 85130 46 16163 8 2 0.995692 0.999333 0.189863 0.997509 2.288757 2.079858 1.730097 1.766565 - SNP PASS 69175 68877 298 85130 46 16163 8 2 0.995692 0.999333 0.189863 0.997509 2.288757 2.079858 1.730097 1.766565 +INDEL ALL 9974 9945 29 21029 9 10728 3 5 0.997092 0.999126 0.510153 0.998108 NaN NaN 1.630447 2.161535 +INDEL PASS 9974 9945 29 21029 9 10728 3 5 0.997092 0.999126 0.510153 0.998108 NaN NaN 1.630447 2.161535 + SNP ALL 69175 68875 300 85017 45 16054 8 2 0.995663 0.999347 0.188833 0.997502 2.288757 2.082385 1.730097 1.779155 + SNP PASS 69175 68875 300 85017 45 16054 8 2 0.995663 0.999347 0.188833 0.997502 2.288757 2.082385 1.730097 1.779155 ``` To summarize: | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score | | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- | -| INDEL | 9946 | 28 | 10 | 0.997193 | 0.999029 | 0.998110 | -| SNP | 68877 | 298 | 46 | 0.995692 | 0.999333 | 0.997509 | +| INDEL | 9945 | 29 | 9 | 0.997092 | 0.999126 | 0.998108 | +| SNP | 68875 | 300 | 45 | 0.995663 | 0.999347 | 0.997502 | diff --git a/docs/deepvariant-exome-case-study.md b/docs/deepvariant-exome-case-study.md index 7d266ea3d..d6557e814 100644 --- a/docs/deepvariant-exome-case-study.md +++ b/docs/deepvariant-exome-case-study.md @@ -70,7 +70,7 @@ curl ${HTTPDIR}/idt_capture_novogene.grch38.bed > input/idt_capture_novogene.grc mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${PWD}/input":"/input" \ @@ -138,10 +138,10 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 1051 1020 31 1466 7 417 6 0 0.970504 0.993327 0.284447 0.981783 NaN NaN 1.747283 1.878486 -INDEL PASS 1051 1020 31 1466 7 417 6 0 0.970504 0.993327 0.284447 0.981783 NaN NaN 1.747283 1.878486 - SNP ALL 25279 24984 295 27711 60 2665 36 4 0.988330 0.997604 0.096171 0.992946 2.854703 2.761569 1.623027 1.627764 - SNP PASS 25279 24984 295 27711 60 2665 36 4 0.988330 0.997604 0.096171 0.992946 2.854703 2.761569 1.623027 1.627764 +INDEL ALL 1051 1024 27 1485 8 430 5 2 0.974310 0.992417 0.289562 0.983280 NaN NaN 1.747283 1.796935 +INDEL PASS 1051 1024 27 1485 8 430 5 2 0.974310 0.992417 0.289562 0.983280 NaN NaN 1.747283 1.796935 + SNP ALL 25279 24983 296 27709 60 2665 36 4 0.988291 0.997604 0.096178 0.992926 2.854703 2.761297 1.623027 1.628821 + SNP PASS 25279 24983 296 27709 60 2665 36 4 0.988291 0.997604 0.096178 0.992926 2.854703 2.761297 1.623027 1.628821 ``` [case study on whole genome sequencing data]: deepvariant-case-study.md diff --git a/docs/deepvariant-haploid-support.md b/docs/deepvariant-haploid-support.md index 1e1e8d3fd..69497b3fa 100644 --- a/docs/deepvariant-haploid-support.md +++ b/docs/deepvariant-haploid-support.md @@ -54,6 +54,6 @@ excluded in haploid regions. The likelihood vector becomes: `L={L[(REF, REF), 0, L(ALT1, ALT1)]}`. Then we normalize the likelihood vector and assign the genotype based on the adjusted values from the vector. -In DeepVariant r1.8, we added extra logic in the `make_examples` stage to adjust +We have added extra logic in the `make_examples` stage to adjust for reference blocks as well. See the discussion in https://github.com/google/deepvariant/issues/811. diff --git a/docs/deepvariant-hybrid-case-study.md b/docs/deepvariant-hybrid-case-study.md index 856369979..81de26dfb 100644 --- a/docs/deepvariant-hybrid-case-study.md +++ b/docs/deepvariant-hybrid-case-study.md @@ -109,7 +109,7 @@ you can run this case study within about half an hour (tested on 64 CPUs). mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${PWD}/input":"/input" \ @@ -182,10 +182,10 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10601 27 22907 49 11747 9 39 0.997460 0.995609 0.512813 0.996534 NaN NaN 1.748961 2.619528 -INDEL PASS 10628 10601 27 22907 49 11747 9 39 0.997460 0.995609 0.512813 0.996534 NaN NaN 1.748961 2.619528 - SNP ALL 70166 70145 21 101591 42 31378 12 22 0.999701 0.999402 0.308866 0.999551 2.296566 1.861933 1.883951 2.123073 - SNP PASS 70166 70145 21 101591 42 31378 12 22 0.999701 0.999402 0.308866 0.999551 2.296566 1.861933 1.883951 2.123073 +INDEL ALL 10628 10603 25 23095 46 11928 7 38 0.997648 0.995881 0.516475 0.996763 NaN NaN 1.748961 2.657483 +INDEL PASS 10628 10603 25 23095 46 11928 7 38 0.997648 0.995881 0.516475 0.996763 NaN NaN 1.748961 2.657483 + SNP ALL 70166 70141 25 101878 42 31671 16 18 0.999644 0.999402 0.310872 0.999523 2.296566 1.861096 1.883951 2.228213 + SNP PASS 70166 70141 25 101878 42 31671 16 18 0.999644 0.999402 0.310872 0.999523 2.296566 1.861096 1.883951 2.228213 ``` Notice that F1 scores are above 0.999 for SNPs and above 0.995 for indels! diff --git a/docs/deepvariant-masseq-case-study.md b/docs/deepvariant-masseq-case-study.md index 64ed6e71c..175cd026f 100644 --- a/docs/deepvariant-masseq-case-study.md +++ b/docs/deepvariant-masseq-case-study.md @@ -78,7 +78,7 @@ The command below will run the DeepVariant MAS-Seq model and produce an output VCF. ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${PWD}/input":"/input" \ diff --git a/docs/deepvariant-ont-r104-simplex-case-study.md b/docs/deepvariant-ont-r104-simplex-case-study.md index 7f0a2b09d..63596583e 100644 --- a/docs/deepvariant-ont-r104-simplex-case-study.md +++ b/docs/deepvariant-ont-r104-simplex-case-study.md @@ -74,7 +74,7 @@ mkdir -p "${OUTPUT_DIR}/${INTERMEDIATE_DIRECTORY}" We will run DeepVariant from docker using the `run_deepvariant` script. ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${INPUT_DIR}":"${INPUT_DIR}" \ @@ -153,10 +153,10 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 9408 1220 19574 912 8954 391 305 0.885209 0.914124 0.457444 0.899434 NaN NaN 1.748961 2.269171 -INDEL PASS 10628 9408 1220 19574 912 8954 391 305 0.885209 0.914124 0.457444 0.899434 NaN NaN 1.748961 2.269171 - SNP ALL 70166 70100 66 103527 50 33353 20 24 0.999059 0.999287 0.322167 0.999173 2.296566 1.773367 1.883951 1.794260 - SNP PASS 70166 70100 66 103527 50 33353 20 24 0.999059 0.999287 0.322167 0.999173 2.296566 1.773367 1.883951 1.794260 +INDEL ALL 10628 9385 1243 18491 923 7882 401 278 0.883045 0.912998 0.426261 0.897772 NaN NaN 1.748961 2.094405 +INDEL PASS 10628 9385 1243 18491 923 7882 401 278 0.883045 0.912998 0.426261 0.897772 NaN NaN 1.748961 2.094405 + SNP ALL 70166 70069 97 97413 39 27287 12 22 0.998618 0.999444 0.280117 0.999031 2.296566 1.809049 1.883951 1.409583 + SNP PASS 70166 70069 97 97413 39 27287 12 22 0.998618 0.999444 0.280117 0.999031 2.296566 1.809049 1.883951 1.409583 ``` ## Acknowledgement diff --git a/docs/deepvariant-pacbio-model-case-study.md b/docs/deepvariant-pacbio-model-case-study.md index 68d725539..9961434d8 100644 --- a/docs/deepvariant-pacbio-model-case-study.md +++ b/docs/deepvariant-pacbio-model-case-study.md @@ -4,9 +4,9 @@ In this case study we describe applying DeepVariant to PacBio HiFi reads to call variants. We will call small variants from a publicly available whole genome HiFi dataset from PacBio. -### Updated dataset in release 1.8.0 +### Updated dataset -In release 1.8.0, we have updated the PacBio test data from HG003 Sequel-II to +We have updated the PacBio test data from HG003 Sequel-II to latest Revio with SPRQ chemistry data to showcase performance on the updated platform and chemistry. The full bam data is available [here](https://downloads.pacbcloud.com/public/revio/2024Q4/WGS/GIAB_trio/HG003/analysis/GRCh38.m84039_241002_000337_s3.hifi_reads.bc2020.bam). @@ -69,7 +69,7 @@ mkdir -p "${OUTPUT_DIR}/${INTERMEDIATE_DIRECTORY}" We will run DeepVariant from docker using the `run_deepvariant` script. ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker run \ -v "${INPUT_DIR}":"${INPUT_DIR}" \ @@ -147,8 +147,8 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10543 85 22403 74 11375 40 29 0.992002 0.993290 0.507744 0.992646 NaN NaN 1.748961 2.138647 -INDEL PASS 10628 10543 85 22403 74 11375 40 29 0.992002 0.993290 0.507744 0.992646 NaN NaN 1.748961 2.138647 - SNP ALL 70166 70101 65 105602 71 35342 12 12 0.999074 0.998989 0.334672 0.999032 2.296566 1.713281 1.883951 1.503192 - SNP PASS 70166 70101 65 105602 71 35342 12 12 0.999074 0.998989 0.334672 0.999032 2.296566 1.713281 1.883951 1.503192 +INDEL ALL 10628 10555 73 22529 70 11492 37 27 0.993131 0.993658 0.510098 0.993394 NaN NaN 1.748961 2.177492 +INDEL PASS 10628 10555 73 22529 70 11492 37 27 0.993131 0.993658 0.510098 0.993394 NaN NaN 1.748961 2.177492 + SNP ALL 70166 70107 59 102385 69 32116 8 10 0.999159 0.999018 0.313679 0.999089 2.296566 1.729639 1.883951 1.438846 + SNP PASS 70166 70107 59 102385 69 32116 8 10 0.999159 0.999018 0.313679 0.999089 2.296566 1.729639 1.883951 1.438846 ``` diff --git a/docs/deepvariant-quick-start.md b/docs/deepvariant-quick-start.md index c90054527..78d91543e 100644 --- a/docs/deepvariant-quick-start.md +++ b/docs/deepvariant-quick-start.md @@ -33,7 +33,7 @@ If you want to compile the DeepVariant binaries for yourself, we also have a ### Get Docker image ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo apt -y update sudo apt-get -y install docker.io @@ -267,16 +267,16 @@ You should see output similar to the following. ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 4 4 0 13 0 9 0 0 1.000000 1.0 0.692308 1.000000 NaN NaN 0.333333 1.000000 -INDEL PASS 4 4 0 13 0 9 0 0 1.000000 1.0 0.692308 1.000000 NaN NaN 0.333333 1.000000 - SNP ALL 44 43 1 59 0 16 0 0 0.977273 1.0 0.271186 0.988506 1.2 1.36 0.333333 0.340909 - SNP PASS 44 43 1 59 0 16 0 0 0.977273 1.0 0.271186 0.988506 1.2 1.36 0.333333 0.340909 +INDEL ALL 4 4 0 13 0 9 0 0 1.0 1.0 0.692308 1.0 NaN NaN 0.333333 1.000000 +INDEL PASS 4 4 0 13 0 9 0 0 1.0 1.0 0.692308 1.0 NaN NaN 0.333333 1.000000 + SNP ALL 44 44 0 60 0 16 0 0 1.0 1.0 0.266667 1.0 1.2 1.307692 0.333333 0.395349 + SNP PASS 44 44 0 60 0 16 0 0 1.0 1.0 0.266667 1.0 1.2 1.307692 0.333333 0.395349 ``` [BAM]: http://genome.sph.umich.edu/wiki/BAM [BWA]: https://academic.oup.com/bioinformatics/article/25/14/1754/225615/Fast-and-accurate-short-read-alignment-with [docker build]: https://docs.docker.com/engine/reference/commandline/build/ -[Dockerfile]: https://github.com/google/deepvariant/blob/r1.8/Dockerfile +[Dockerfile]: https://github.com/google/deepvariant/blob/r1.9/Dockerfile [FASTA]: https://en.wikipedia.org/wiki/FASTA_format [Quick Start in r0.7]: https://github.com/google/deepvariant/blob/r0.7/docs/deepvariant-quick-start.md [VCF]: https://samtools.github.io/hts-specs/VCFv4.3.pdf diff --git a/docs/deepvariant-xy-calling-case-study.md b/docs/deepvariant-xy-calling-case-study.md index 6940f42fe..03fdc2552 100644 --- a/docs/deepvariant-xy-calling-case-study.md +++ b/docs/deepvariant-xy-calling-case-study.md @@ -72,7 +72,7 @@ mkdir -p "${OUTPUT_DIR}/${INTERMEDIATE_DIRECTORY}" We will run DeepVariant from docker using the `run_deepvariant` script. ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker pull google/deepvariant:"${BIN_VERSION}" @@ -138,8 +138,8 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 24273 23289 984 31014 644 6687 49 465 0.959461 0.973527 0.215612 0.966443 NaN NaN 1.559454 0.068240 -INDEL PASS 24273 23289 984 31014 644 6687 49 465 0.959461 0.973527 0.215612 0.966443 NaN NaN 1.559454 0.068240 - SNP ALL 87443 86918 525 132164 1449 43958 11 234 0.993996 0.983573 0.332602 0.988757 1.937122 1.541799 1.825434 0.046716 - SNP PASS 87443 86918 525 132164 1449 43958 11 234 0.993996 0.983573 0.332602 0.988757 1.937122 1.541799 1.825434 0.046716 +INDEL ALL 24273 23635 638 31059 344 6690 33 222 0.973716 0.985884 0.215397 0.979762 NaN NaN 1.559454 0.066724 +INDEL PASS 24273 23635 638 31059 344 6690 33 222 0.973716 0.985884 0.215397 0.979762 NaN NaN 1.559454 0.066724 + SNP ALL 87443 86954 489 129803 1374 41688 16 82 0.994408 0.984407 0.321164 0.989382 1.937122 1.560296 1.825434 0.047647 + SNP PASS 87443 86954 489 129803 1374 41688 16 82 0.994408 0.984407 0.321164 0.989382 1.937122 1.560296 1.825434 0.047647 ``` diff --git a/docs/pangenome-aware-wes-bwa-case-study.md b/docs/pangenome-aware-wes-bwa-case-study.md index 4f924dc7e..b3ae51e89 100644 --- a/docs/pangenome-aware-wes-bwa-case-study.md +++ b/docs/pangenome-aware-wes-bwa-case-study.md @@ -86,7 +86,7 @@ machine. mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="pangenome_aware_deepvariant-1.8.0" +BIN_VERSION="pangenome_aware_deepvariant-1.9.0" sudo docker pull google/deepvariant:"${BIN_VERSION}" @@ -159,8 +159,8 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 29 29 0 41 0 11 0 0 1.00000 1.0 0.268293 1.000000 NaN NaN 3.000000 2.727273 -INDEL PASS 29 29 0 41 0 11 0 0 1.00000 1.0 0.268293 1.000000 NaN NaN 3.000000 2.727273 - SNP ALL 685 683 2 704 0 21 0 0 0.99708 1.0 0.029830 0.998538 3.28125 3.266667 1.795918 1.838710 - SNP PASS 685 683 2 704 0 21 0 0 0.99708 1.0 0.029830 0.998538 3.28125 3.266667 1.795918 1.838710 +INDEL ALL 29 29 0 44 0 14 0 0 1.00000 1.0 0.318182 1.000000 NaN NaN 3.000000 3.000000 +INDEL PASS 29 29 0 44 0 14 0 0 1.00000 1.0 0.318182 1.000000 NaN NaN 3.000000 3.000000 + SNP ALL 685 683 2 703 0 20 0 0 0.99708 1.0 0.028450 0.998538 3.28125 3.260606 1.795918 1.834677 + SNP PASS 685 683 2 703 0 20 0 0 0.99708 1.0 0.028450 0.998538 3.28125 3.260606 1.795918 1.834677 ``` diff --git a/docs/pangenome-aware-wgs-bwa-case-study.md b/docs/pangenome-aware-wgs-bwa-case-study.md index e2a4f9f5b..066f5eadf 100644 --- a/docs/pangenome-aware-wgs-bwa-case-study.md +++ b/docs/pangenome-aware-wgs-bwa-case-study.md @@ -75,7 +75,7 @@ machine. mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="pangenome_aware_deepvariant-1.8.0" +BIN_VERSION="pangenome_aware_deepvariant-1.9.0" sudo docker pull google/deepvariant:"${BIN_VERSION}" @@ -147,8 +147,8 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10584 44 20850 19 9790 14 5 0.995860 0.998282 0.469544 0.99707 NaN NaN 1.748961 2.291024 -INDEL PASS 10628 10584 44 20850 19 9790 14 5 0.995860 0.998282 0.469544 0.99707 NaN NaN 1.748961 2.291024 - SNP ALL 70166 69932 234 86798 66 16764 45 3 0.996665 0.999058 0.193138 0.99786 2.296566 2.016604 1.883951 1.739749 - SNP PASS 70166 69932 234 86798 66 16764 45 3 0.996665 0.999058 0.193138 0.99786 2.296566 2.016604 1.883951 1.739749 +INDEL ALL 10628 10579 49 20891 20 9836 15 4 0.99539 0.998191 0.470825 0.996788 NaN NaN 1.748961 2.296933 +INDEL PASS 10628 10579 49 20891 20 9836 15 4 0.99539 0.998191 0.470825 0.996788 NaN NaN 1.748961 2.296933 + SNP ALL 70166 69926 240 86703 76 16665 45 3 0.99658 0.998915 0.192208 0.997746 2.296566 2.018093 1.883951 1.728859 + SNP PASS 70166 69926 240 86703 76 16665 45 3 0.99658 0.998915 0.192208 0.997746 2.296566 2.018093 1.883951 1.728859 ``` diff --git a/docs/pangenome-aware-wgs-vg-case-study.md b/docs/pangenome-aware-wgs-vg-case-study.md index e7d808696..4810f0fb4 100644 --- a/docs/pangenome-aware-wgs-vg-case-study.md +++ b/docs/pangenome-aware-wgs-vg-case-study.md @@ -80,7 +80,7 @@ machine. mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="pangenome_aware_deepvariant-1.8.0" +BIN_VERSION="pangenome_aware_deepvariant-1.9.0" sudo docker pull google/deepvariant:"${BIN_VERSION}" @@ -152,8 +152,8 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10594 34 21276 32 10189 21 8 0.996801 0.997114 0.478896 0.996957 NaN NaN 1.748961 2.231995 -INDEL PASS 10628 10594 34 21276 32 10189 21 8 0.996801 0.997114 0.478896 0.996957 NaN NaN 1.748961 2.231995 - SNP ALL 70166 70090 76 90303 94 20078 21 5 0.998917 0.998661 0.222340 0.998789 2.296566 1.942569 1.883951 1.599631 - SNP PASS 70166 70090 76 90303 94 20078 21 5 0.998917 0.998661 0.222340 0.998789 2.296566 1.942569 1.883951 1.599631 +INDEL ALL 10628 10594 34 21303 24 10222 18 4 0.996801 0.997834 0.479839 0.997317 NaN NaN 1.748961 2.237105 +INDEL PASS 10628 10594 34 21303 24 10222 18 4 0.996801 0.997834 0.479839 0.997317 NaN NaN 1.748961 2.237105 + SNP ALL 70166 70094 72 90172 106 19930 19 4 0.998974 0.998491 0.221022 0.998732 2.296566 1.943471 1.883951 1.592173 + SNP PASS 70166 70094 72 90172 106 19930 19 4 0.998974 0.998491 0.221022 0.998732 2.296566 1.943471 1.883951 1.592173 ``` From 5177f6eb19b70f95c86c84d9152973fe6ece8a14 Mon Sep 17 00:00:00 2001 From: lucasbrambrink Date: Mon, 12 May 2025 15:42:14 -0700 Subject: [PATCH 4/9] Update all DeepTrio case-studies for 1.9 release (including metrics) PiperOrigin-RevId: 757941033 --- docs/deeptrio-pacbio-case-study.md | 38 +++++------ docs/deeptrio-quick-start.md | 4 +- docs/deeptrio-wgs-case-study.md | 39 +++++------ docs/trio-merge-case-study.md | 104 ++++++++++++++--------------- 4 files changed, 93 insertions(+), 92 deletions(-) diff --git a/docs/deeptrio-pacbio-case-study.md b/docs/deeptrio-pacbio-case-study.md index 428bb8fa4..b77d712c2 100644 --- a/docs/deeptrio-pacbio-case-study.md +++ b/docs/deeptrio-pacbio-case-study.md @@ -85,7 +85,7 @@ is run as a separate command. mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo apt -y update sudo apt-get -y install docker.io @@ -221,13 +221,13 @@ As a result we should get the following output: ```bash Checking: /output/HG002_trio_merged.vcf.gz Family: [HG003 + HG004] -> [HG002] -188 non-pass records were skipped -Concordance HG002: F:166225/169750 (97.92%) M:166415/168977 (98.48%) F+M:159575/164659 (96.91%) +126 non-pass records were skipped +Concordance HG002: F:173615/180788 (96.03%) M:174179/180845 (96.31%) F+M:165688/177112 (93.55%) Sample HG002 has less than 99.0 concordance with both parents. Check for incorrect pedigree or sample mislabelling. -0/188437 (0.00%) records did not conform to expected call ploidy -176829/188437 (93.84%) records were variant in at least 1 family member and checked for Mendelian constraints -10143/176829 (5.74%) records had indeterminate consistency status due to incomplete calls -6722/176829 (3.80%) records contained a violation of Mendelian constraints +0/191837 (0.00%) records did not conform to expected call ploidy +185941/191837 (96.93%) records were variant in at least 1 family member and checked for Mendelian constraints +7334/185941 (3.94%) records had indeterminate consistency status due to incomplete calls +12647/185941 (6.80%) records contained a violation of Mendelian constraints ``` ### Benchmark variant calls against 4.2.1 truth set with hap.py @@ -289,22 +289,22 @@ sudo docker run \ ``` Benchmarking Summary for HG002: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 11256 11213 43 23405 84 11635 32 45 0.996180 0.992863 0.497116 0.994519 NaN NaN 1.561710 2.151675 -INDEL PASS 11256 11213 43 23405 84 11635 32 45 0.996180 0.992863 0.497116 0.994519 NaN NaN 1.561710 2.151675 - SNP ALL 71333 71305 28 108561 21 37160 14 7 0.999607 0.999706 0.342296 0.999657 2.314904 1.742256 1.715978 1.772847 - SNP PASS 71333 71305 28 108561 21 37160 14 7 0.999607 0.999706 0.342296 0.999657 2.314904 1.742256 1.715978 1.772847 +INDEL ALL 11256 11214 42 23119 78 11356 32 40 0.996269 0.993369 0.491198 0.994817 NaN NaN 1.561710 2.075045 +INDEL PASS 11256 11214 42 23119 78 11356 32 40 0.996269 0.993369 0.491198 0.994817 NaN NaN 1.561710 2.075045 + SNP ALL 71333 71310 23 109529 18 38126 9 9 0.999678 0.999748 0.348090 0.999713 2.314904 1.724732 1.715978 1.611481 + SNP PASS 71333 71310 23 109529 18 38126 9 9 0.999678 0.999748 0.348090 0.999713 2.314904 1.724732 1.715978 1.611481 Benchmarking Summary for HG003: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10577 51 23776 77 12634 33 43 0.995201 0.993089 0.531376 0.994144 NaN NaN 1.748961 2.332224 -INDEL PASS 10628 10577 51 23776 77 12634 33 43 0.995201 0.993089 0.531376 0.994144 NaN NaN 1.748961 2.332224 - SNP ALL 70166 70143 23 117125 35 46898 13 9 0.999672 0.999502 0.400410 0.999587 2.296566 1.57963 1.883951 1.685873 - SNP PASS 70166 70143 23 117125 35 46898 13 9 0.999672 0.999502 0.400410 0.999587 2.296566 1.57963 1.883951 1.685873 +INDEL ALL 10628 10575 53 23560 71 12427 36 34 0.995013 0.993623 0.527462 0.994317 NaN NaN 1.748961 2.321257 +INDEL PASS 10628 10575 53 23560 71 12427 36 34 0.995013 0.993623 0.527462 0.994317 NaN NaN 1.748961 2.321257 + SNP ALL 70166 70149 17 118416 32 48186 8 11 0.999758 0.999544 0.406921 0.999651 2.296566 1.576658 1.883951 1.697684 + SNP PASS 70166 70149 17 118416 32 48186 8 11 0.999758 0.999544 0.406921 0.999651 2.296566 1.576658 1.883951 1.697684 Benchmarking Summary for HG004: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 11000 10954 46 24235 70 12701 29 36 0.995818 0.993931 0.524077 0.994874 NaN NaN 1.792709 2.351344 -INDEL PASS 11000 10954 46 24235 70 12701 29 36 0.995818 0.993931 0.524077 0.994874 NaN NaN 1.792709 2.351344 - SNP ALL 71659 71617 42 116988 22 45260 11 7 0.999414 0.999693 0.386877 0.999554 2.310073 1.633809 1.878340 1.626369 - SNP PASS 71659 71617 42 116988 22 45260 11 7 0.999414 0.999693 0.386877 0.999554 2.310073 1.633809 1.878340 1.626369 +INDEL ALL 11000 10952 48 23981 61 12453 30 27 0.995636 0.994709 0.519286 0.995172 NaN NaN 1.792709 2.328987 +INDEL PASS 11000 10952 48 23981 61 12453 30 27 0.995636 0.994709 0.519286 0.995172 NaN NaN 1.792709 2.328987 + SNP ALL 71659 71622 37 118880 18 47151 6 8 0.999484 0.999749 0.396627 0.999616 2.310073 1.620779 1.878340 1.616919 + SNP PASS 71659 71622 37 118880 18 47151 6 8 0.999484 0.999749 0.396627 0.999616 2.310073 1.620779 1.878340 1.616919 ``` diff --git a/docs/deeptrio-quick-start.md b/docs/deeptrio-quick-start.md index 6463e234b..97b790bdf 100644 --- a/docs/deeptrio-quick-start.md +++ b/docs/deeptrio-quick-start.md @@ -32,7 +32,7 @@ documentation on how to build. ### Get Docker image ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo apt -y update sudo apt-get -y install docker.io @@ -338,7 +338,7 @@ INDEL PASS 2 2 0 2 0 0 [BAM]: http://genome.sph.umich.edu/wiki/BAM [BWA]: https://academic.oup.com/bioinformatics/article/25/14/1754/225615/Fast-and-accurate-short-read-alignment-with [docker build]: https://docs.docker.com/engine/reference/commandline/build/ -[Dockerfile]: https://github.com/google/deepvariant/blob/r1.8/Dockerfile.deeptrio +[Dockerfile]: https://github.com/google/deepvariant/blob/r1.9/Dockerfile.deeptrio [FASTA]: https://en.wikipedia.org/wiki/FASTA_format [VCF]: https://samtools.github.io/hts-specs/VCFv4.3.pdf [run_deeptrio.py]: ../scripts/run_deeptrio.py diff --git a/docs/deeptrio-wgs-case-study.md b/docs/deeptrio-wgs-case-study.md index 41d076115..52de513ca 100644 --- a/docs/deeptrio-wgs-case-study.md +++ b/docs/deeptrio-wgs-case-study.md @@ -82,7 +82,7 @@ command. mkdir -p output mkdir -p output/intermediate_results_dir -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker pull google/deepvariant:deeptrio-"${BIN_VERSION}" @@ -211,13 +211,13 @@ As a result we should get the following output: ```bash Checking: /output/HG002_trio_merged.vcf.gz Family: [HG003 + HG004] -> [HG002] -86 non-pass records were skipped -Concordance HG002: F:138004/139790 (98.72%) M:138049/139959 (98.64%) F+M:134711/138044 (97.59%) +87 non-pass records were skipped +Concordance HG002: F:138190/140157 (98.60%) M:138255/140341 (98.51%) F+M:135069/138517 (97.51%) Sample HG002 has less than 99.0 concordance with both parents. Check for incorrect pedigree or sample mislabelling. -0/146134 (0.00%) records did not conform to expected call ploidy -143783/146134 (98.39%) records were variant in at least 1 family member and checked for Mendelian constraints -5082/143783 (3.53%) records had indeterminate consistency status due to incomplete calls -3842/143783 (2.67%) records contained a violation of Mendelian constraints +0/145514 (0.00%) records did not conform to expected call ploidy +143324/145514 (98.49%) records were variant in at least 1 family member and checked for Mendelian constraints +4116/143324 (2.87%) records had indeterminate consistency status due to incomplete calls +3993/143324 (2.79%) records contained a violation of Mendelian constraints ``` ### Perform analysis with hap.py against 4.2.1 truth set @@ -279,22 +279,23 @@ sudo docker run \ ``` Benchmarking Summary for HG002: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 11256 11208 48 21232 13 9579 7 4 0.995736 0.998884 0.451159 0.997308 NaN NaN 1.561710 2.044750 -INDEL PASS 11256 11208 48 21232 13 9579 7 4 0.995736 0.998884 0.451159 0.997308 NaN NaN 1.561710 2.044750 - SNP ALL 71333 71088 245 89034 41 17853 4 3 0.996565 0.999424 0.200519 0.997993 2.314904 2.026055 1.715978 1.717178 - SNP PASS 71333 71088 245 89034 41 17853 4 3 0.996565 0.999424 0.200519 0.997993 2.314904 2.026055 1.715978 1.717178 +INDEL ALL 11256 11208 48 21116 13 9462 7 4 0.995736 0.998885 0.448096 0.997308 NaN NaN 1.561710 2.029690 +INDEL PASS 11256 11208 48 21116 13 9462 7 4 0.995736 0.998885 0.448096 0.997308 NaN NaN 1.561710 2.029690 + SNP ALL 71333 71087 246 88097 37 16924 4 6 0.996551 0.999480 0.192106 0.998014 2.314904 2.038019 1.715978 1.700209 + SNP PASS 71333 71087 246 88097 37 16924 4 6 0.996551 0.999480 0.192106 0.998014 2.314904 2.038019 1.715978 1.700209 Benchmarking Summary for HG003: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10578 50 21055 24 9997 17 6 0.995295 0.997830 0.474804 0.996561 NaN NaN 1.748961 2.209131 -INDEL PASS 10628 10578 50 21055 24 9997 17 6 0.995295 0.997830 0.474804 0.996561 NaN NaN 1.748961 2.209131 - SNP ALL 70166 69977 189 85399 64 15325 17 8 0.997306 0.999087 0.179452 0.998196 2.296566 2.061752 1.883951 1.846595 - SNP PASS 70166 69977 189 85399 64 15325 17 8 0.997306 0.999087 0.179452 0.998196 2.296566 2.061752 1.883951 1.846595 +INDEL ALL 10628 10576 52 20960 23 9905 19 3 0.995107 0.997919 0.472567 0.996511 NaN NaN 1.748961 2.198304 +INDEL PASS 10628 10576 52 20960 23 9905 19 3 0.995107 0.997919 0.472567 0.996511 NaN NaN 1.748961 2.198304 + SNP ALL 70166 69976 190 85469 55 15402 17 4 0.997292 0.999215 0.180206 0.998253 2.296566 2.058455 1.883951 1.850204 + SNP PASS 70166 69976 190 85469 55 15402 17 4 0.997292 0.999215 0.180206 0.998253 2.296566 2.058455 1.883951 1.850204 Benchmarking Summary for HG004: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 11000 10949 51 21433 23 9975 16 5 0.995364 0.997993 0.465404 0.996676 NaN NaN 1.792709 2.280107 -INDEL PASS 11000 10949 51 21433 23 9975 16 5 0.995364 0.997993 0.465404 0.996676 NaN NaN 1.792709 2.280107 - SNP ALL 71659 71445 214 86523 48 14980 8 3 0.997014 0.999329 0.173133 0.998170 2.310073 2.064759 1.878340 1.737322 - SNP PASS 71659 71445 214 86523 48 14980 8 3 0.997014 0.999329 0.173133 0.998170 2.310073 2.064759 1.878340 1.737322 +INDEL ALL 11000 10950 50 21369 24 9909 17 5 0.995455 0.997906 0.463709 0.996679 NaN NaN 1.792709 2.271194 +INDEL PASS 11000 10950 50 21369 24 9909 17 5 0.995455 0.997906 0.463709 0.996679 NaN NaN 1.792709 2.271194 + SNP ALL 71659 71445 214 86684 49 15139 6 6 0.997014 0.999315 0.174646 0.998163 2.310073 2.058081 1.878340 1.740313 + SNP PASS 71659 71445 214 86684 49 15139 6 6 0.997014 0.999315 0.174646 0.998163 2.310073 2.058081 1.878340 1.740313 + ``` diff --git a/docs/trio-merge-case-study.md b/docs/trio-merge-case-study.md index e56bf58f7..31cb5e9c5 100644 --- a/docs/trio-merge-case-study.md +++ b/docs/trio-merge-case-study.md @@ -115,7 +115,7 @@ serially is not the most effective approach. ``` N_SHARDS=$(nproc) # Or change to the number of cores you want to use CAPTURE_BED=agilent_sureselect_human_all_exon_v5_b37_targets.bed -VERSION=1.8.0 +VERSION=1.9.0 declare -a trio=(HG002 HG003 HG004) for SAMPLE in "${trio[@]}" @@ -226,16 +226,16 @@ The output is: ``` Checking: /data/deepvariant.cohort.vcf.gz Family: [Sample_Diag-excap51-HG003-EEogPU + Sample_Diag-excap51-HG004-EEogPU] -> [Sample_Diag-excap51-HG002-EEogPU] -Concordance Sample_Diag-excap51-HG002-EEogPU: F:46502/46866 (99.22%) M:46737/46863 (99.73%) F+M:46291/46785 (98.94%) +Concordance Sample_Diag-excap51-HG002-EEogPU: F:46509/46873 (99.22%) M:46743/46870 (99.73%) F+M:46292/46790 (98.94%) Sample Sample_Diag-excap51-HG002-EEogPU has less than 99.0 concordance with both parents. Check for incorrect pedigree or sample mislabelling. -584/47001 (1.24%) records did not conform to expected call ploidy -46959/47001 (99.91%) records were variant in at least 1 family member and checked for Mendelian constraints -129/46959 (0.27%) records had indeterminate consistency status due to incomplete calls -494/46959 (1.05%) records contained a violation of Mendelian constraints +584/47006 (1.24%) records did not conform to expected call ploidy +46968/47006 (99.92%) records were variant in at least 1 family member and checked for Mendelian constraints +133/46968 (0.28%) records had indeterminate consistency status due to incomplete calls +498/46968 (1.06%) records contained a violation of Mendelian constraints ``` -From this report, we know that there is a 1.10% Mendelian violation rate, and -0.32% of the records had incomplete calls (with `.`) so RTG couldn't determine +From this report, we know that there is a 1.06% Mendelian violation rate, and +0.28% of the records had incomplete calls (with `.`) so RTG couldn't determine whether there is violation or not. ## Single sample quality metrics @@ -264,9 +264,9 @@ done | Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) | | ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ | -| HG002 | 29955 | 11693 | 2.56 | 29942 | 11673 | 2.57 | -| HG003 | 29852 | 11769 | 2.54 | 29842 | 1174 | 2.54 | -| HG004 | 30048 | 11838 | 2.54 | 3003 | 11821 | 2.54 | +| HG002 | 29957 | 11695 | 2.56 | 29944 | 11675 | 2.56 | +| HG003 | 29851 | 11772 | 2.54 | 29841 | 11749 | 2.54 | +| HG004 | 30046 | 11836 | 2.54 | 30035 | 11818 | 2.54 | If you want to restrict to the truth BED files, use this command: @@ -289,8 +289,8 @@ Which resulted in this table: | Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) | | ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ | | HG002 | 27716 | 10549 | 2.63 | 27708 | 10536 | 2.63 | -| HG003 | 27382 | 10527 | 2.60 | 27378 | 10515 | 2.60 | -| HG004 | 27503 | 10607 | 2.59 | 27496 | 10596 | 2.59 | +| HG003 | 27380 | 10527 | 2.60 | 27376 | 10515 | 2.60 | +| HG004 | 27502 | 10606 | 2.59 | 27495 | 10595 | 2.60 | ### Rtg vcfstats @@ -312,69 +312,69 @@ HG002: ``` Location : /data/HG002.vcf.gz -Failed Filters : 14566 +Failed Filters : 14155 Passed Filters : 45290 -SNPs : 41615 +SNPs : 41619 MNPs : 0 Insertions : 1874 -Deletions : 1779 +Deletions : 1776 Indels : 21 -Same as reference : 1 -SNP Transitions/Transversions: 2.56 (41843/16345) -Total Het/Hom ratio : 1.49 (27130/18159) -SNP Het/Hom ratio : 1.51 (25066/16549) +Same as reference : 0 +SNP Transitions/Transversions: 2.56 (41846/16348) +Total Het/Hom ratio : 1.49 (27129/18161) +SNP Het/Hom ratio : 1.51 (25068/16551) MNP Het/Hom ratio : - (0/0) -Insertion Het/Hom ratio : 1.07 (967/907) -Deletion Het/Hom ratio : 1.53 (1076/703) +Insertion Het/Hom ratio : 1.06 (966/908) +Deletion Het/Hom ratio : 1.53 (1074/702) Indel Het/Hom ratio : - (21/0) -Insertion/Deletion ratio : 1.05 (1874/1779) -Indel/SNP+MNP ratio : 0.09 (3674/41615) +Insertion/Deletion ratio : 1.06 (1874/1776) +Indel/SNP+MNP ratio : 0.09 (3671/41619) ``` HG003: ``` Location : /data/HG003.vcf.gz -Failed Filters : 15383 -Passed Filters : 45190 -SNPs : 41585 +Failed Filters : 14773 +Passed Filters : 45192 +SNPs : 41587 MNPs : 0 -Insertions : 1843 -Deletions : 1743 -Indels : 18 +Insertions : 1844 +Deletions : 1741 +Indels : 19 Same as reference : 1 -SNP Transitions/Transversions: 2.52 (41678/16558) -Total Het/Hom ratio : 1.48 (26984/18205) -SNP Het/Hom ratio : 1.50 (24960/16625) +SNP Transitions/Transversions: 2.52 (41679/16563) +Total Het/Hom ratio : 1.48 (26982/18209) +SNP Het/Hom ratio : 1.50 (24958/16629) MNP Het/Hom ratio : - (0/0) -Insertion Het/Hom ratio : 1.09 (962/881) -Deletion Het/Hom ratio : 1.49 (1044/699) -Indel Het/Hom ratio : - (18/0) -Insertion/Deletion ratio : 1.06 (1843/1743) -Indel/SNP+MNP ratio : 0.09 (3604/41585) +Insertion Het/Hom ratio : 1.09 (962/882) +Deletion Het/Hom ratio : 1.49 (1043/698) +Indel Het/Hom ratio : - (19/0) +Insertion/Deletion ratio : 1.06 (1844/1741) +Indel/SNP+MNP ratio : 0.09 (3604/41587) ``` HG004: ``` Location : /data/HG004.vcf.gz -Failed Filters : 15176 -Passed Filters : 45505 -SNPs : 41856 +Failed Filters : 14577 +Passed Filters : 45496 +SNPs : 41850 MNPs : 0 -Insertions : 1860 +Insertions : 1857 Deletions : 1766 Indels : 22 Same as reference : 1 -SNP Transitions/Transversions: 2.55 (41681/16348) -Total Het/Hom ratio : 1.57 (27795/17709) -SNP Het/Hom ratio : 1.59 (25703/16153) +SNP Transitions/Transversions: 2.55 (41677/16343) +Total Het/Hom ratio : 1.57 (27789/17706) +SNP Het/Hom ratio : 1.59 (25700/16150) MNP Het/Hom ratio : - (0/0) -Insertion Het/Hom ratio : 1.11 (980/880) -Deletion Het/Hom ratio : 1.61 (1090/676) +Insertion Het/Hom ratio : 1.11 (978/879) +Deletion Het/Hom ratio : 1.61 (1089/677) Indel Het/Hom ratio : - (22/0) -Insertion/Deletion ratio : 1.05 (1860/1766) -Indel/SNP+MNP ratio : 0.09 (3648/41856) +Insertion/Deletion ratio : 1.05 (1857/1766) +Indel/SNP+MNP ratio : 0.09 (3645/41850) ``` ### Run hap.py to calculate the accuracy of DeepVariant generated call sets @@ -403,6 +403,6 @@ Accuracy F1 scores: Sample | Indel | SNP ------ | -------- | -------- -HG002 | 0.974037 | 0.994146 -HG003 | 0.968448 | 0.993913 -HG004 | 0.972569 | 0.994189 +HG002 | 0.974189 | 0.994172 +HG003 | 0.968096 | 0.993913 +HG004 | 0.973091 | 0.994162 From 9f589b9b8f2e089d2d0f4095c91b3dec258df4a1 Mon Sep 17 00:00:00 2001 From: shafin Date: Mon, 12 May 2025 16:35:51 -0700 Subject: [PATCH 5/9] Update case studies for 1.9 PiperOrigin-RevId: 757960591 --- docs/deepvariant-fast-pipeline-case-study.md | 20 ++++++++--------- docs/deepvariant-training-case-study.md | 2 +- docs/deepvariant-vg-case-study.md | 23 ++++++++++---------- 3 files changed, 22 insertions(+), 23 deletions(-) diff --git a/docs/deepvariant-fast-pipeline-case-study.md b/docs/deepvariant-fast-pipeline-case-study.md index db341d4c8..0cdf7141e 100644 --- a/docs/deepvariant-fast-pipeline-case-study.md +++ b/docs/deepvariant-fast-pipeline-case-study.md @@ -48,13 +48,13 @@ Please refer to the following documentation for more details. [Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) For this case study we used the -[script](https://github.com/google/deepvariant/blob/r1.8.0/scripts/install_nvidia_docker.sh) +[script](https://github.com/google/deepvariant/blob/r1.9/scripts/install_nvidia_docker.sh) that automates the CUDA and container tools kit installation. Please note that the script takes about 30 minutes to run. ```bash -wget https://raw.githubusercontent.com/google/deepvariant/refs/heads/r1.8.0/scripts/install_nvidia_docker.sh +wget https://raw.githubusercontent.com/google/deepvariant/refs/heads/r1.9/scripts/install_nvidia_docker.sh chmod +x install_nvidia_docker.sh ./install_nvidia_docker.sh ``` @@ -64,7 +64,7 @@ chmod +x install_nvidia_docker.sh ### Get DeepVariant Docker image ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker pull google/deepvariant:"${BIN_VERSION}-gpu" ``` @@ -217,9 +217,9 @@ variants.gvcf.chr20.vcf With the same settings the pipeline takes approximately 10 minutes. ``` -real 8m15.252s -user 0m0.007s -sys 0m0.035s +real 12m45.795s +user 0m0.018s +sys 0m0.038s ``` ## Benchmark output @@ -256,8 +256,8 @@ time sudo docker run \ ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 10628 10543 85 22403 74 11375 40 29 0.992002 0.993290 0.507744 0.992646 NaN NaN 1.748961 2.138647 -INDEL PASS 10628 10543 85 22403 74 11375 40 29 0.992002 0.993290 0.507744 0.992646 NaN NaN 1.748961 2.138647 - SNP ALL 70166 70101 65 105602 71 35342 12 12 0.999074 0.998989 0.334672 0.999032 2.296566 1.713281 1.883951 1.503192 - SNP PASS 70166 70101 65 105602 71 35342 12 12 0.999074 0.998989 0.334672 0.999032 2.296566 1.713281 1.883951 1.503192 +INDEL ALL 10628 10553 75 22560 72 11522 37 28 0.992943 0.993477 0.510727 0.993210 NaN NaN 1.748961 2.180292 +INDEL PASS 10628 10553 75 22560 72 11522 37 28 0.992943 0.993477 0.510727 0.993210 NaN NaN 1.748961 2.180292 + SNP ALL 70166 70106 60 102415 69 32148 9 9 0.999145 0.999018 0.313899 0.999081 2.296566 1.72911 1.883951 1.442237 + SNP PASS 70166 70106 60 102415 69 32148 9 9 0.999145 0.999018 0.313899 0.999081 2.296566 1.72911 1.883951 1.442237 ``` diff --git a/docs/deepvariant-training-case-study.md b/docs/deepvariant-training-case-study.md index 418ae2f90..441316810 100644 --- a/docs/deepvariant-training-case-study.md +++ b/docs/deepvariant-training-case-study.md @@ -534,7 +534,7 @@ sudo docker run --gpus 1 \ --disable_small_model ``` -Starting in v1.8.0, by default we use a small model to classify some +We use a small model to classify some candidates. In this example, we set `--disable_small_model` so that small model is disabled. This allows us to run all examples through the model we just trained. diff --git a/docs/deepvariant-vg-case-study.md b/docs/deepvariant-vg-case-study.md index 24f0a551d..faf4d33c3 100644 --- a/docs/deepvariant-vg-case-study.md +++ b/docs/deepvariant-vg-case-study.md @@ -172,7 +172,6 @@ Get the same reference we used for ```bash FTPDIR=ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids - curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > ${DATA_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna samtools faidx ${DATA_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna ``` @@ -184,7 +183,7 @@ And then, run DeepVariant. [DeepVariant Case Study](deepvariant-case-study.md).) ```bash -BIN_VERSION="1.8.0" +BIN_VERSION="1.9.0" sudo docker pull google/deepvariant:"${BIN_VERSION}" @@ -204,9 +203,9 @@ time sudo docker run \ Stage | Time (minutes) -------------------------------- | ----------------- -make_examples | 59m19.845s -call_variants | 49m41.643s -postprocess_variants (with gVCF) | 7m46.195s +make_examples | 81m11.112s +call_variants | 38m27.228s +postprocess_variants (with gVCF) | 9m13.565s ### Run hap.py @@ -244,16 +243,16 @@ Output: ``` Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio -INDEL ALL 504501 502210 2291 954974 1522 429900 956 362 0.995459 0.997101 0.450169 0.996279 NaN NaN 1.489759 1.942299 -INDEL PASS 504501 502210 2291 954974 1522 429900 956 362 0.995459 0.997101 0.450169 0.996279 NaN NaN 1.489759 1.942299 - SNP ALL 3327496 3316336 11160 3823082 4229 500683 1696 356 0.996646 0.998727 0.130963 0.997686 2.102576 1.990152 1.535137 1.449299 - SNP PASS 3327496 3316336 11160 3823082 4229 500683 1696 356 0.996646 0.998727 0.130963 0.997686 2.102576 1.990152 1.535137 1.449299 +INDEL ALL 504501 502342 2159 956579 1444 431515 881 290 0.995721 0.99725 0.451102 0.996485 NaN NaN 1.489759 1.924206 +INDEL PASS 504501 502342 2159 956579 1444 431515 881 290 0.995721 0.99725 0.451102 0.996485 NaN NaN 1.489759 1.924206 + SNP ALL 3327496 3319188 8308 4031912 5621 705300 1705 469 0.997503 0.99831 0.174929 0.997907 2.102576 1.889869 1.535137 1.312185 + SNP PASS 3327496 3319188 8308 4031912 5621 705300 1705 469 0.997503 0.99831 0.174929 0.997907 2.102576 1.889869 1.535137 1.312185 ``` This can be compared with -https://github.com/google/deepvariant/blob/r1.8/docs/metrics.md#accuracy. +https://github.com/google/deepvariant/blob/r1.9/docs/metrics.md#accuracy. Which shows that `vg giraffe` improves F1: -- Indel F1: 0.995945 --> 0.996279 -- SNP F1: 0.996213 --> 0.997686 +- Indel F1: 0.995845 --> 0.996485 +- SNP F1: 0.996133 --> 0.997907 From 43283f74954dffb1bb76e67ad62e8fc0a13538fd Mon Sep 17 00:00:00 2001 From: shafin Date: Tue, 13 May 2025 14:58:30 -0700 Subject: [PATCH 6/9] Update training docs for 1.9 release PiperOrigin-RevId: 758388183 --- docs/deeptrio-details-training-data.md | 5 +++++ docs/deepvariant-details-training-data.md | 7 ++++++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/deeptrio-details-training-data.md b/docs/deeptrio-details-training-data.md index 2fb6d3b8a..25114f7bb 100644 --- a/docs/deeptrio-details-training-data.md +++ b/docs/deeptrio-details-training-data.md @@ -18,6 +18,7 @@ Parent model | 1.4.0 | 7 HG005/HG006/HG007 trios
3 HG002/HG003/HG004 trios | 457,374,516 1.5.0 | [(6)](#vfootnote6)3 HG002, 3 HG003, 3 HG004, 7 HG005, 6 HG006, 6 HG007 | 457,374,464 1.6.0 | [(6)](#vfootnote6)2 HG001, 3 HG002, 3 HG003, 3 HG004, 7 HG005, 6 HG006, 6 HG007, 2 NA12891, 2 NA12892 | 457,420,038 +1.9.0 | [(6)](#vfootnote6)2 HG001, 3 HG002, 3 HG003, 3 HG004, 7 HG005, 6 HG006, 6 HG007, 2 NA12891, 2 NA12892 | 457,420,038 ### WES models @@ -37,6 +38,7 @@ Parent model | | 1.4.0 | 6 HG005/HG006/HG007 trios
6 HG002/HG003/HG004 trios | 13,036,995 | | 1.5.0 | [(6)](#vfootnote6)6 HG002, 6 HG003, 6 HG004, 8 HG005, 8 HG006, 8 HG007 | 13,036,998 | | 1.6.0 | [(6)](#vfootnote6)6 HG002, 6 HG003, 6 HG004, 8 HG005, 8 HG006, 8 HG007 | 13,039,595 | +| 1.9.0 | [(6)](#vfootnote6)6 HG002, 6 HG003, 6 HG004, 8 HG005, 8 HG006, 8 HG007 | 13,039,595 | @@ -56,6 +58,7 @@ Parent model | | 1.3.0 | 2 HG005/HG006/HG007 trio
10 HG002/HG003/HG004 trios | 533,353,050[(5)](#vfootnote5) | | 1.4.0 | (Same model as 1.3.0) | | | 1.6.0 | 9 HG002, 5 HG003, 5 HG004, 1 HG005, 1 HG006, 1 HG007 | 838,515,085[(5)](#vfootnote5) | +| 1.9.0 | 9 HG002, 5 HG003, 5 HG004, 1 HG005, 1 HG006, 1 HG007 | 607,118,560[(5)](#vfootnote5) | ### ONT models[(2)](#vfootnote2)[(3)](#vfootnote3) | version | Replicates | #examples | @@ -63,6 +66,8 @@ Parent model | | 1.6.0 | 1 HG002, 1 HG002, 1 HG004 | 50,249,704[(5)](#vfootnote5) | | Parent model | | | | 1.6.0 | 1 HG002, 1 HG002, 1 HG004 | 99,675,190[(5)](#vfootnote5) | +| 1.9.0 | 5 HG002, 5 HG004, 4 HG005, 4 HG006, 4 HG007 | 607,118,560[(5)](#vfootnote5) | + (1): We include HG002/HG003/HG004 for training WGS model, but only using examples from the region of NIST truth confident region v4.2 subtracting v3.3.2. diff --git a/docs/deepvariant-details-training-data.md b/docs/deepvariant-details-training-data.md index e45023b94..41816941b 100644 --- a/docs/deepvariant-details-training-data.md +++ b/docs/deepvariant-details-training-data.md @@ -19,6 +19,7 @@ v1.4 | 12 HG001
6 HG002[(12)](#vfootnote12)
6 HG004[(12 v1.5 | 13 HG001
14 HG002
8 HG004
9 HG005
4 HG006
4 HG007 | 815,200,320 v1.6 | 21 HG001
17 HG002
8 HG004
9 HG005
4 HG006
4 HG007 | 929,199,066 v1.8 | Same model as v1.6 +v1.9 | 11 HG001
17 HG002
7 HG004
8 HG005
3 HG006
3 HG007 | 942,514,071 ### WES models @@ -38,6 +39,7 @@ v1.4 | 41 HG001
9 HG002
9 HG004
12 HG005
9 HG006
9 HG007[ v1.5 | 40 HG001
9 HG002
9 HG004
12 HG005
9 HG006
9 HG007 | 21,027,625 v1.6 | 57 HG001
9 HG002
9 HG004
12 HG005
9 HG006
9 HG007 | 21,027,614 v1.8 | 58 HG001
9 HG002
9 HG004
11 HG005
9 HG006
9 HG007 | 25,598,763 +v1.9 | 57 HG001
9 HG002
9 HG004
11 HG005
9 HG006
9 HG007 | 25,598,763 ### PACBIO models @@ -54,6 +56,7 @@ v1.4 | 1 HG001
19 HG002
3 HG004
1 HG005
1 HG006
1 HG007 | 1,17 v1.5 | 3 HG001
29 HG002
7 HG004
2 HG005
3 HG006
2 HG007 | 1,729,659,396 v1.6 | 6 HG001
60 HG002
16 HG004
4 HG005
6 HG006
4 HG007 | 3,195,507,862 v1.8 | 3 HG001
10 HG002
4 HG004
5 HG005
0 HG006
0 HG007 | 416,516,418 +v1.9 | 3 HG001
10 HG002
4 HG004
5 HG005
0 HG006
0 HG007 | 416,516,418 ### ONT models @@ -61,6 +64,7 @@ version | Replicates | #examples ------- | --------------------------- | ------------------------------ v1.6 | 3 HG002
1 HG004
1 HG005 | 534,302,654 v1.8 | 7 HG002
1 HG004
1 HG005 | 1,591,950,794 +v1.9 | 7 HG002
1 HG004
1 HG005 | 1,591,950,794 ### HYBRID models @@ -73,7 +77,8 @@ v1.3 | Same model as v1.2 | v1.4 | 10 HG002
1 HG004
1 HG005
1 HG006
1 HG007 | 215,863,645 v1.5 | 10 HG002
1 HG004
1 HG005
1 HG006
1 HG007 | 215,863,664 v1.6 | 10 HG002
1 HG004
1 HG005
1 HG006
1 HG007 | 215,353,081 -v1.6 | Same model as v1.6 | +v1.8 | Same model as v1.6 | +v1.9 | Same model as v1.6 (1): In v0.5, we experimented with adding whole exome sequencing data into training data. In v0.6, we took it out because it didn't From d65a5037d360d8c8b34963ef63093afbb41e9d9a Mon Sep 17 00:00:00 2001 From: shafin Date: Fri, 16 May 2025 12:40:01 -0700 Subject: [PATCH 7/9] Remove duplex case-study from readme. PiperOrigin-RevId: 759717566 --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 1a00b81ac..751deaf71 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,7 @@ DeepVariant supports germline variant-calling in diploid organisms. * PacBio HiFi data [PacBio case study](docs/deepvariant-pacbio-model-case-study.md). * Oxford Nanopore R10.4.1 - [Simplex case study](docs/deepvariant-ont-r104-simplex-case-study.md), - [Duplex case study](docs/deepvariant-ont-r104-duplex-case-study.md). + [Simplex case study](docs/deepvariant-ont-r104-simplex-case-study.md). * Complete Genomics [T7 case study](docs/deepvariant-complete-t7-case-study.md); [G400 case study](docs/deepvariant-complete-g400-case-study.md). From 6fc7e0fc7edda9f84fdcb2e6d2ab965602a729c8 Mon Sep 17 00:00:00 2001 From: shafin Date: Fri, 16 May 2025 12:58:37 -0700 Subject: [PATCH 8/9] Added permissions to small_model in DeepTrio PiperOrigin-RevId: 759724298 --- Dockerfile.deeptrio | 1 + 1 file changed, 1 insertion(+) diff --git a/Dockerfile.deeptrio b/Dockerfile.deeptrio index e0992a393..d178c157a 100644 --- a/Dockerfile.deeptrio +++ b/Dockerfile.deeptrio @@ -191,6 +191,7 @@ ADD https://storage.googleapis.com/deepvariant/models/DeepTrio/${VERSION_DEEPTRI WORKDIR /opt/smallmodels/deeptrio/wgs/child/variables ADD https://storage.googleapis.com/deepvariant/models/DeepTrio/${VERSION_DEEPTRIO}/smallmodels/deeptrio.wgs_child.smallmodel/variables/variables.data-00000-of-00001 \ https://storage.googleapis.com/deepvariant/models/DeepTrio/${VERSION_DEEPTRIO}/smallmodels/deeptrio.wgs_child.smallmodel/variables/variables.index ./ +RUN chmod -R +r /opt/smallmodels/deeptrio/wgs/child/* WORKDIR /opt/smallmodels/deeptrio/wgs/parent From ea4de596c410dad3fe617828d8f89a4282d0e85f Mon Sep 17 00:00:00 2001 From: Parsa Eskandar <33617420+parsaeskandar@users.noreply.github.com> Date: Fri, 29 Aug 2025 09:34:59 -0700 Subject: [PATCH 9/9] Update deeptrio-details.md Just a small typo mistake --- docs/deeptrio-details.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/deeptrio-details.md b/docs/deeptrio-details.md index d4786fe4f..fb5a5deea 100644 --- a/docs/deeptrio-details.md +++ b/docs/deeptrio-details.md @@ -66,7 +66,7 @@ chromosome (e.g. for chromosomeX, only the mother and son samples and for chromosomeY only the father and son samples). If needed, DeepTrio can be built from source. For more details please refer to -[Building DeeepTrio](deeptrio-build-test.md). +[Building DeepTrio](deeptrio-build-test.md). ## DeepTrio Input assumptions