Skip to content

Commit 051bcf2

Browse files
committed
Update SBX case study documentation.
PiperOrigin-RevId: 805084468
1 parent 5057722 commit 051bcf2

File tree

2 files changed

+33
-20
lines changed

2 files changed

+33
-20
lines changed

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
[Mapped with VG](pangenome-aware-wgs-vg-case-study.md)
2323
* Pangenome-aware DeepVariant WES:
2424
[Mapped with BWA](pangenome-aware-wes-bwa-case-study.md)
25-
* [SBX case study](sbx-case-study.md)
25+
* [Roche SBX case study](roche-sbx-case-study.md)
2626

2727
## Visualization and analysis
2828

docs/sbx-case-study.md renamed to docs/roche-sbx-case-study.md

Lines changed: 32 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SBX case study for SBX-D and SBX-Fast data
1+
# Roche SBX case study
22

33
## Prepare environment
44

@@ -20,18 +20,18 @@ curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > ref
2020
curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai > reference/GRCh38_no_alt_analysis_set.fasta.fai
2121
```
2222

23-
### Download T2T v1.1 truth for benchmarking
23+
### Download GIAB v4.2.1 truth for benchmarking
2424

25-
We will benchmark our variant calls against T2T v1.1 truth for HG002.
25+
We will benchmark our variant calls against GIAB v4.2.1 truth for HG002.
2626

2727
```bash
2828
mkdir -p benchmark
2929

3030
HTTPDIR=https://storage.googleapis.com/deepvariant/case-study-testdata
3131

32-
curl ${HTTPDIR}/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz
33-
curl ${HTTPDIR}/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz.tbi > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz.tbi
34-
curl ${HTTPDIR}/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark.bed > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark.bed
32+
curl ${HTTPDIR}/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz > benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
33+
curl ${HTTPDIR}/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi > benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi
34+
curl ${HTTPDIR}/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed > benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed
3535
```
3636

3737
### Download GBZ built for GRCh38
@@ -43,26 +43,26 @@ HTTPDIR=https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/f
4343
curl ${HTTPDIR}/hprc-v1.1-mc-grch38.gbz > input/hprc-v1.1-mc-grch38.gbz
4444
```
4545

46-
### Download HG002 BAM
46+
### Download HG002 chr20 BAM
4747

48-
Please download a SBX-D (or SBX-Fast) HG002 BAM and put it in your input dir.
48+
We will use Roche SBX HG002 chr20 BAM for this case-study.
4949

5050
```bash
51-
# Download your HG002 BAM
52-
HG002_BAM=/input/your_HG002.bam # This will be used in the command later.
51+
mkdir -p input
52+
HTTPDIR=https://storage.googleapis.com/deepvariant/roche-sbx-case-study-testdata
53+
54+
curl ${HTTPDIR}/HG002.roche_sbx.chr20.bam > input/HG002.roche_sbx.chr20.bam
55+
curl ${HTTPDIR}/HG002.roche_sbx.chr20.bam.bai > input/HG002.roche_sbx.chr20.bam.bai
5356
```
5457

5558
### Download the model
5659

57-
In this case study, we're calling variants on HG002 chr20, and we'll evaluate
58-
with T2T v1.1 truth. We'll use the "leave-out-HG001" model, which we also left
59-
out all chromosome 20 from training or tuning. Refer to the technical white
60-
paper for more details on all experiments.
60+
In this case study, we're calling variants on HG002 chr20.
6161

6262
```bash
6363
mkdir -p model
6464

65-
HTTPDIR=https://storage.googleapis.com/brain-genomics-public/research/sbx/2025/models/leave-out-HG001
65+
HTTPDIR=https://storage.googleapis.com/brain-genomics-public/research/sbx/2025/model/leave-out-HG001
6666

6767
curl ${HTTPDIR}/model.ckpt.data-00000-of-00001 > model/model.ckpt.data-00000-of-00001
6868
curl ${HTTPDIR}/model.ckpt.index > model/model.ckpt.index
@@ -75,7 +75,7 @@ curl ${HTTPDIR}/example_info.json > model/example_info.json
7575
mkdir -p output
7676
mkdir -p output/intermediate_results_dir
7777

78-
BIN_VERSION="pangenome_aware_deepvariant-head784362481"
78+
BIN_VERSION="pangenome_aware_deepvariant-sbx"
7979

8080
sudo docker run \
8181
-v "${PWD}/input":"/input" \
@@ -87,7 +87,7 @@ sudo docker run \
8787
/opt/deepvariant/bin/run_pangenome_aware_deepvariant \
8888
--model_type WGS \
8989
--ref /reference/GRCh38_no_alt_analysis_set.fasta \
90-
--reads "${HG002_BAM}" \
90+
--reads /input/HG002.roche_sbx.chr20.bam \
9191
--pangenome /input/hprc-v1.1-mc-grch38.gbz \
9292
--output_vcf /output/HG002.chr20.output.vcf.gz \
9393
--output_gvcf /output/HG002.chr20.output.g.vcf.gz \
@@ -114,12 +114,25 @@ sudo docker run \
114114
-v "${PWD}/happy:/happy" \
115115
jmcdani20/hap.py:v0.3.12 \
116116
/opt/hap.py/bin/hap.py \
117-
/benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz \
117+
/benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
118118
/output/HG002.chr20.output.vcf.gz \
119-
-f /benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark.bed \
119+
-f /benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed \
120120
-r /reference/GRCh38_no_alt_analysis_set.fasta \
121121
-o /happy/happy.output \
122122
--engine=vcfeval \
123123
--pass-only \
124124
-l chr20
125125
```
126+
127+
Output:
128+
129+
```
130+
Benchmarking Summary:
131+
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
132+
INDEL ALL 11256 11237 19 22167 22 10474 10 10 0.998312 0.998119 0.472504 0.998215 NaN NaN 1.561710 2.089132
133+
INDEL PASS 11256 11237 19 22167 22 10474 10 10 0.998312 0.998119 0.472504 0.998215 NaN NaN 1.561710 2.089132
134+
SNP ALL 71333 71286 47 91930 41 20553 12 3 0.999341 0.999426 0.223572 0.999383 2.314904 1.943955 1.715978 1.640709
135+
SNP PASS 71333 71286 47 91930 41 20553 12 3 0.999341 0.999426 0.223572 0.999383 2.314904 1.943955 1.715978 1.640709
136+
```
137+
138+
For all Roche SBX 30x BAMs and VCFs follow this [link](https://console.cloud.google.com/storage/browser/brain-genomics-public/research/sbx/2025/).

0 commit comments

Comments
 (0)