1- # SBX case study for SBX-D and SBX-Fast data
1+ # Roche SBX case study
22
33## Prepare environment
44
@@ -20,18 +20,18 @@ curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > ref
2020curl ${FTPDIR} /GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai > reference/GRCh38_no_alt_analysis_set.fasta.fai
2121```
2222
23- ### Download T2T v1 .1 truth for benchmarking
23+ ### Download GIAB v4.2 .1 truth for benchmarking
2424
25- We will benchmark our variant calls against T2T v1 .1 truth for HG002.
25+ We will benchmark our variant calls against GIAB v4.2 .1 truth for HG002.
2626
2727``` bash
2828mkdir -p benchmark
2929
3030HTTPDIR=https://storage.googleapis.com/deepvariant/case-study-testdata
3131
32- curl ${HTTPDIR} /GRCh38_HG2-T2TQ100-V1.1_smvar. vcf.gz > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar .vcf.gz
33- curl ${HTTPDIR} /GRCh38_HG2-T2TQ100-V1.1_smvar. vcf.gz.tbi > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar .vcf.gz.tbi
34- curl ${HTTPDIR} /GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark .bed > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark .bed
32+ curl ${HTTPDIR} /HG002_GRCh38_1_22_v4.2.1_benchmark. vcf.gz > benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark .vcf.gz
33+ curl ${HTTPDIR} /HG002_GRCh38_1_22_v4.2.1_benchmark. vcf.gz.tbi > benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark .vcf.gz.tbi
34+ curl ${HTTPDIR} /HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent .bed > benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent .bed
3535```
3636
3737### Download GBZ built for GRCh38
@@ -43,26 +43,26 @@ HTTPDIR=https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/f
4343curl ${HTTPDIR} /hprc-v1.1-mc-grch38.gbz > input/hprc-v1.1-mc-grch38.gbz
4444```
4545
46- ### Download HG002 BAM
46+ ### Download HG002 chr20 BAM
4747
48- Please download a SBX-D (or SBX-Fast) HG002 BAM and put it in your input dir .
48+ We will use Roche SBX HG002 chr20 BAM for this case-study .
4949
5050``` bash
51- # Download your HG002 BAM
52- HG002_BAM=/input/your_HG002.bam # This will be used in the command later.
51+ mkdir -p input
52+ HTTPDIR=https://storage.googleapis.com/deepvariant/roche-sbx-case-study-testdata
53+
54+ curl ${HTTPDIR} /HG002.roche_sbx.chr20.bam > input/HG002.roche_sbx.chr20.bam
55+ curl ${HTTPDIR} /HG002.roche_sbx.chr20.bam.bai > input/HG002.roche_sbx.chr20.bam.bai
5356```
5457
5558### Download the model
5659
57- In this case study, we're calling variants on HG002 chr20, and we'll evaluate
58- with T2T v1.1 truth. We'll use the "leave-out-HG001" model, which we also left
59- out all chromosome 20 from training or tuning. Refer to the technical white
60- paper for more details on all experiments.
60+ In this case study, we're calling variants on HG002 chr20.
6161
6262``` bash
6363mkdir -p model
6464
65- HTTPDIR=https://storage.googleapis.com/brain-genomics-public/research/sbx/2025/models /leave-out-HG001
65+ HTTPDIR=https://storage.googleapis.com/brain-genomics-public/research/sbx/2025/model /leave-out-HG001
6666
6767curl ${HTTPDIR} /model.ckpt.data-00000-of-00001 > model/model.ckpt.data-00000-of-00001
6868curl ${HTTPDIR} /model.ckpt.index > model/model.ckpt.index
@@ -75,7 +75,7 @@ curl ${HTTPDIR}/example_info.json > model/example_info.json
7575mkdir -p output
7676mkdir -p output/intermediate_results_dir
7777
78- BIN_VERSION=" pangenome_aware_deepvariant-head784362481 "
78+ BIN_VERSION=" pangenome_aware_deepvariant-sbx "
7979
8080sudo docker run \
8181 -v " ${PWD} /input" :" /input" \
@@ -87,7 +87,7 @@ sudo docker run \
8787 /opt/deepvariant/bin/run_pangenome_aware_deepvariant \
8888 --model_type WGS \
8989 --ref /reference/GRCh38_no_alt_analysis_set.fasta \
90- --reads " ${HG002_BAM} " \
90+ --reads /input/HG002.roche_sbx.chr20.bam \
9191 --pangenome /input/hprc-v1.1-mc-grch38.gbz \
9292 --output_vcf /output/HG002.chr20.output.vcf.gz \
9393 --output_gvcf /output/HG002.chr20.output.g.vcf.gz \
@@ -114,12 +114,25 @@ sudo docker run \
114114 -v " ${PWD} /happy:/happy" \
115115 jmcdani20/hap.py:v0.3.12 \
116116 /opt/hap.py/bin/hap.py \
117- /benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar .vcf.gz \
117+ /benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark .vcf.gz \
118118 /output/HG002.chr20.output.vcf.gz \
119- -f /benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark .bed \
119+ -f /benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent .bed \
120120 -r /reference/GRCh38_no_alt_analysis_set.fasta \
121121 -o /happy/happy.output \
122122 --engine=vcfeval \
123123 --pass-only \
124124 -l chr20
125125```
126+
127+ Output:
128+
129+ ```
130+ Benchmarking Summary:
131+ Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
132+ INDEL ALL 11256 11237 19 22167 22 10474 10 10 0.998312 0.998119 0.472504 0.998215 NaN NaN 1.561710 2.089132
133+ INDEL PASS 11256 11237 19 22167 22 10474 10 10 0.998312 0.998119 0.472504 0.998215 NaN NaN 1.561710 2.089132
134+ SNP ALL 71333 71286 47 91930 41 20553 12 3 0.999341 0.999426 0.223572 0.999383 2.314904 1.943955 1.715978 1.640709
135+ SNP PASS 71333 71286 47 91930 41 20553 12 3 0.999341 0.999426 0.223572 0.999383 2.314904 1.943955 1.715978 1.640709
136+ ```
137+
138+ For all Roche SBX 30x BAMs and VCFs follow this [ link] ( https://console.cloud.google.com/storage/browser/brain-genomics-public/research/sbx/2025/ ) .
0 commit comments