To do list for the v1.1b panel #56

hangsuUNC · 2025-01-24T15:55:28Z

1. Can we run the same HPRC 1.1 w/o TRGT as a baseline? Less priority would be HPRC 1.0 w/o TRGT (if we don't already have that run somewhere, but I'd do a fresh run anyway).
2. Can one run HiPhase with short+TRGT only? Is there some distinction between how HiPhase handles the SV and TRGT inputs?
3. What is the overlap between the TRGT catalog and 1.1? What AFs does the TRGT catalog cover?
4. If the overlap is substantial, can we do some quick Vcfdist evals of the appropriate HiPhase SV outputs, stratified by TR/non-TR regions to get an idea of whether 1.1 or TRGT is more reliable in each region type? For example, if HPRC 1.1 w/o TRGT metrics in TR regions look not so great, maybe we should just prefer the TRGT genotypes in those regions. One might imagine that if we end up preferring TRGT genotypes in TR regions and end up with disjoint sets of 1.1 non-TR and TRGT TR, HiPhase and bcftools concat will have an easier time.
5. Have we made any progress looking at TRGT merge (on pure TRGT outputs, not HiPhase outputs)? Is this just a trivial operation on squared genotypes?
6. I think we are using HiPhase 1.3.0. Have we tried more recent versions? In particular, it seems like 1.4.0 might introduce some noteworthy changes.
7. See where the raw per-sample TRGT calls lie on Fabio's latest ROC plots. This will of course just be a point rather than a curve for each sample.

hangsuUNC · 2025-02-03T22:17:28Z

TRGT merge: 0aa95bb1-29e6-491b-a06f-530ee922b6bc

samuelklee · 2025-02-04T19:28:19Z

Thanks, @hangsuUNC, can we perhaps populate some more details here? For example, 1) pointers to common inputs (joint SNP callset, integrated SV callsets, their HPRC chr1 subsets, etc.), 2) a table of the experiments we are running giving relevant submission IDs and pointers to generated results, and 3) eventually populate that table with precision/recall metrics and the corresponding submission IDs for evaluation runs (I can help with this part)?

Basically, for each experiment (e.g., HPRC 1.1 w/ TRGT and HiPhase 1.3.0), can you give me the inputs I need to run ChromosomePhasedPanelCreationFromHiPhase (https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/6751528b-ac58-4e25-87c0-bae436efb83f for that experiment), followed by ConcatAndEvaluate (https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/97d231ad-c40d-4efb-b2eb-06ee763de8c8)?

For example, see the Job History comments for those runs, which refer to the submission IDs that generated the joint SNP and integrated 1.1 HPRC chr1 callsets. It would be nice if the Job History comments were also self-contained, but there's only so much you can fit in there---better to keep full track of things here, and include actual pointers to inputs in buckets, etc.

Finally, after checking off bullet points above, it might also be nice to put a quick blurb here about any findings, including any pointers if they might be helpful.

hangsuUNC · 2025-02-04T19:56:07Z

The output files are here:

HPRC_V10_woTRGT_SNP: gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/0c6101a5-42a5-493c-a5b8-9baf876c34d9/HierarchicallyMergeVcfs/e19d90c0-ed0c-41cd-acad-306642bed072/call-ConcatVcfs/HPRC_V10_woTRGT_SNP.vcf.gz
HPRC_V10_woTRGT_SVs: gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/bd97d689-abf2-4f0a-bbe4-b28aae1ed794/HierarchicallyMergeVcfs/b9e64f4b-1508-45e9-bc2e-e3ee09f6ced2/call-ConcatVcfs/HPRC_V10_woTRGT_SVs.vcf.gz
HPRC_V11_woTRGT_SNP: gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/fd9edac0-99e1-4907-89d0-2cb494797428/HierarchicallyMergeVcfs/1d1d5bb7-1ade-45b6-9337-d663408cf170/call-ConcatVcfs/attempt-2/HPRC_V11_woTRGT_SNP.vcf.gz
HPRC_V11_woTRGT_SVs: gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/e72f3b25-a317-48c9-8b96-ed44142311f8/HierarchicallyMergeVcfs/cd9bacd3-1196-474d-bc8c-6ea2279486ac/call-ConcatVcfs/HPRC_V11_woTRGT_SVs.vcf.gz

All the information is listed in the Table: HPRC-extra-callers-hg38_set in the AoU Paper copy workspace.

samuelklee · 2025-02-04T20:17:52Z

Experiment	ConcatAndEvaluate summary (not in TR)	ConcatAndEvaluate summary (in TR)	Merged HiPhase SNP output	Merged HiPhase SV output	ChromosomePhasedPanelCreationFromHiPhase submission	ConcatAndEvaluate submission (not in TR)	ConcatAndEvaluate submission (in TR)
HPRC short w/ TRGT, HiPhase 1.4.5, w/ confident regions	x	x	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/b9a18ef7-d764-451d-a13f-964caad117be/HierarchicallyMergeVcfs/d8da568a-097c-4498-87a5-13bc4e698d9f/call-ConcatVcfs/HPRC_V11_woSV_hiphase145_SNP.vcf.gz	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/97511d9a-8906-402d-bce4-46270fbc4934/HierarchicallyMergeVcfs/0520d86a-8ffb-4493-9390-c0c7458d4d14/call-ConcatVcfs/HPRC_V11_woSV_hiphase145_TRGT.vcf.gz	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/submission_history/9989a299-a4b5-484d-a093-432992184ba1	x	x
HPRC 1.1 w/o TRGT, HiPhase 1.4.5, w/ confident regions	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/0277d9be-de58-4d39-babb-1cfd834011a6/PhasedPanelEvaluation/cffe41f7-e800-4fa3-91da-b618716dafef/call-SummarizeEvaluations/attempt-2/evaluation_summary.tsv?authuser=0	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/6fdbf976-16b5-468b-b456-df743068a179/PhasedPanelEvaluation/f4e0877b-050f-4da8-bd7a-258b1c24b883/call-SummarizeEvaluations/evaluation_summary.tsv?authuser=0	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/d57fbaba-9d9b-4bc4-8d32-bdc8e579b2a1/HierarchicallyMergeVcfs/40ba741a-4d81-4dc7-92b1-3177dc809b22/call-ConcatVcfs/HPRC_V11_woTRGT_hiphase145_SNP.vcf.gz	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/db3a53a8-77db-4dbb-95c9-16d43d527f62/HierarchicallyMergeVcfs/44c49851-6829-4e73-ac15-0e516a2c034d/call-ConcatVcfs/HPRC_V11_woTRGT_hiphase145_SV.vcf.gz	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/6c84f37f-76f5-40ab-a6ec-81980fc2b3f7	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/submission_history/0277d9be-de58-4d39-babb-1cfd834011a6	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/submission_history/6fdbf976-16b5-468b-b456-df743068a179
HPRC 1.1 w/o TRGT, HiPhase 1.4.5	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/656d0740-4c9d-4b85-bb64-3b5670f1dd04/PhasedPanelEvaluation/cd2b9543-4889-4801-b321-11176e3ecb1a/call-SummarizeEvaluations/evaluation_summary.tsv?authuser=0	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/a7fb3cc6-0ce6-4b45-9939-926e1094f2a4/PhasedPanelEvaluation/9f8f19aa-5f33-4a60-826a-277a80b6c67c/call-SummarizeEvaluations/evaluation_summary.tsv?authuser=0	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/d57fbaba-9d9b-4bc4-8d32-bdc8e579b2a1/HierarchicallyMergeVcfs/40ba741a-4d81-4dc7-92b1-3177dc809b22/call-ConcatVcfs/HPRC_V11_woTRGT_hiphase145_SNP.vcf.gz	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/db3a53a8-77db-4dbb-95c9-16d43d527f62/HierarchicallyMergeVcfs/44c49851-6829-4e73-ac15-0e516a2c034d/call-ConcatVcfs/HPRC_V11_woTRGT_hiphase145_SV.vcf.gz	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/6c84f37f-76f5-40ab-a6ec-81980fc2b3f7	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/656d0740-4c9d-4b85-bb64-3b5670f1dd04	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/a7fb3cc6-0ce6-4b45-9939-926e1094f2a4
HPRC 1.1 w/ TRGT, HiPhase 1.3.0	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/97d231ad-c40d-4efb-b2eb-06ee763de8c8/PhasedPanelEvaluation/602316f9-b3e3-4c03-b564-32085f63dc00/call-SummarizeEvaluations/evaluation_summary.tsv?authuser=0	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/6519095a-ad62-48f6-bd4f-695942d2af12/PhasedPanelEvaluation/f52d5a5b-d6fc-436a-b22a-9639dcb03e91/call-SummarizeEvaluations/evaluation_summary.tsv?authuser=0	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/494642a6-04d8-4afd-9a19-911daac9e6d6/HierarchicallyMergeVcfs/799d50c6-276f-4c90-8b40-24ce23e865ac/call-ConcatVcfs/HPRC.hiphase.short.merged.vcf.gz	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/722a1d9f-2827-4b7d-a9e4-a1205671ea27/HierarchicallyMergeVcfs/bb6cc434-a96e-45da-81f3-e01b7a2bbf25/call-ConcatVcfs/HPRC.hiphase.SV.merged.vcf.gz	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/287c72b8-08c3-4ee4-a762-ed40c9dead54	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/97d231ad-c40d-4efb-b2eb-06ee763de8c8	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/6519095a-ad62-48f6-bd4f-695942d2af12
HPRC 1.1 w/o TRGT, HiPhase 1.3.0	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/bc82695b-2e34-47cc-a3f6-a3c51a297a7b/PhasedPanelEvaluation/2038bd0e-e630-4892-9e0e-deecac5167d8/call-SummarizeEvaluations/evaluation_summary.tsv?authuser=0	https://storage.cloud.google.com/fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/d3f9ba00-21c0-4e31-b864-a6c1f7c1c5a3/PhasedPanelEvaluation/f8e2337e-74fd-4790-b996-4b2a71b92375/call-SummarizeEvaluations/evaluation_summary.tsv?authuser=0	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/fd9edac0-99e1-4907-89d0-2cb494797428/HierarchicallyMergeVcfs/1d1d5bb7-1ade-45b6-9337-d663408cf170/call-ConcatVcfs/attempt-2/HPRC_V11_woTRGT_SNP.vcf.gz	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/e72f3b25-a317-48c9-8b96-ed44142311f8/HierarchicallyMergeVcfs/cd9bacd3-1196-474d-bc8c-6ea2279486ac/call-ConcatVcfs/HPRC_V11_woTRGT_SVs.vcf.gz	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/1a476796-969f-4bb4-8ca7-2f3bccc9b494	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/bc82695b-2e34-47cc-a3f6-a3c51a297a7b	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/d3f9ba00-21c0-4e31-b864-a6c1f7c1c5a3
HPRC 1.0 w/o TRGT, HiPhase 1.3.0	x	x	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/0c6101a5-42a5-493c-a5b8-9baf876c34d9/HierarchicallyMergeVcfs/e19d90c0-ed0c-41cd-acad-306642bed072/call-ConcatVcfs/HPRC_V10_woTRGT_SNP.vcf.gz	gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/bd97d689-abf2-4f0a-bbe4-b28aae1ed794/HierarchicallyMergeVcfs/b9e64f4b-1508-45e9-bc2e-e3ee09f6ced2/call-ConcatVcfs/HPRC_V10_woTRGT_SVs.vcf.gz	https://app.terra.bio/#workspaces/allofus-drc-wgs-lr-prod/AoU_DRC_WGS_LongReads_PacBio%20PAPER%20COPY/job_history/4d55d2fa-6c6a-4445-80f5-9ba16d10786c (FAILED, UNSORTED?)	x	x

samuelklee · 2025-02-05T00:06:00Z

We may also want to consider the effect of the catalog, see e.g. PacificBiosciences/HiPhase#58 (although at this stage, it seems like the decision would be to exclude TRGT if we feel our catalog exhibits similar behavior, not try to rerun TRGT with a different catalog).

samuelklee · 2025-02-05T13:32:18Z

For some reason, a VCF appears to be unsorted in the HPRC 1.0 w/o TRGT, HiPhase 1.4.0 run by the time it gets to the PanGenie panel creation script. We haven't run into this before, but it might be due to a newly inserted normalization step (which was introduced to resolve some intermittent issues with adjacent unnormalized variants that KAGE was bugging on); perhaps we also need to sort after. For now, let's not worry about that run.

Copying over some Slack discussion:

Interestingly, it looks like HPRC 1.1 w/ TRGT and HiPhase 1.3.0 compares unfavorably against HPRC 1.1 w/o TRGT and HiPhase 1.3.0. See e.g. SV recall after HiPhase in TR regions of 38% vs. 75%, at comparable precisions ~80%. So it's possible something is going wrong right off the bat when including TRGT---perhaps bcftools merge is to blame somewhere? Interestingly, after Shapeit4 imputation, recall is ~75% for both.

Have we come to any conclusions about what TRGT merge does?
Probably it's not the main culprit, but we should still investigate HiPhase 1.3.0 vs. HiPhase 1.4.0. At this point I'd suggest running HPRC 1.1 w/o TRGT and HiPhase 1.3.0 (previously I suggested w/ TRGT).
Some understanding of the raw recall of the TRGT genotypes alone would be nice. Do we have this anywhere vs. HPRC dipcall?
I suspect that TRGT can be included in an appropriate way, but perhaps not as naively as we are doing so far. For example, choice of catalog might be having an effect: The impact of repeat variants on phasing PacificBiosciences/HiPhase#58 And more generally, with our naive approach, I think more care needs to be taken in understanding and resolving any overlaps between the short/integrated callsets and the TRGT genotypes. Probably this is as simple as a) using an appropriately sparse catalog, b) preferring TRGT in appropriate regions, and c) using TRGT merge and bcftools merge in the appropriate ways. But perhaps this is more effort than we want to expend at this point---what does everyone think? At least we can tell Matt Danzi we tried!

EDIT: Turns out the runs in the table above above were accidentally done with 1.3.0, not 1.4.0; we've updated the text there but not here. We will do proceed to do runs with 1.4.5 instead.

hangsuUNC · 2025-02-05T14:46:08Z

The TRGT merged joint callset before Hiphase is here: gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/0aa95bb1-29e6-491b-a06f-530ee922b6bc/TRGTMerge/47e34134-97e7-4b48-8c63-aad2ff057ab7/call-TRGTMerge/MergeTRGTbeforeHiphase.trgtmerged.vcf.gz

Do we still want to explore TRGT merge before hiphase as callset integration, split, hiphase then bcftools merge?

samuelklee · 2025-02-06T21:01:08Z

@hangsuUNC let me know where your thinking is at on how to proceed. I think you and @SHuang-Broad can make the call. IMO some quick HPRC testing of HiPhase 1.4.5 + 1.1 seems light enough that we should do it before proceeding to run AoU + HPRC, but whether we want to ramp down on TRGT or do a bit more digging while we're here (and maybe while I proceed with Shapeit4, etc.) is up to you both.

In any case, let's try to continue to record findings here---thanks again!

hangsuUNC · 2025-02-11T18:13:28Z

HPRC testing of Hiphase 1.4.5 + v1.1 is here:
Short: gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/d57fbaba-9d9b-4bc4-8d32-bdc8e579b2a1/HierarchicallyMergeVcfs/40ba741a-4d81-4dc7-92b1-3177dc809b22/call-ConcatVcfs/HPRC_V11_woTRGT_hiphase145_SNP.vcf.gz
SV: gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/submissions/db3a53a8-77db-4dbb-95c9-16d43d527f62/HierarchicallyMergeVcfs/44c49851-6829-4e73-ac15-0e516a2c034d/call-ConcatVcfs/HPRC_V11_woTRGT_hiphase145_SV.vcf.gz

samuelklee · 2025-02-19T13:43:33Z

Thanks, @hangsuUNC, ran the downstream evaluations and added them to the top of the table above.

Looks like there's marginal difference between 1.1 + 1.4.5 w/o TRGT and 1.1 + 1.3.0 w/o TRGT. So if we are OK to drop TRGT, I would be OK with moving forward.

Unfortunately, 1.0 + 1.3.0 w/o TRGT failed---which integrated SV callset did you use here, kanpig or Sniffles? Regardless, we have some old numbers that are in the ballpark of those for the 1.1 runs, so I'm still OK with moving forward.

That said, again, I think the decision about how to proceed with TRGT lies with you and @SHuang-Broad. I might suggest conferring with @fabio-cunial since I think he is looking at it for the Hapestry preprint. But given that TRGT can be directly consumed by HiPhase, I would suggest that for the purposes of the AoU workflows, we think of incorporating TRGT as part of the physical-phasing problem rather than as part of the integration problem.

samuelklee · 2025-02-19T19:08:01Z

Note job_history links above are currently broken because of https://broadinstitute.slack.com/archives/C07RK8KNWMV/p1739888818003019

samuelklee · 2025-02-20T21:20:22Z

Note that I finally added subsetting to confident regions to our Vcfdist implementation (the Vcfdist command line only allows one BED file, which we allocated to account for context). See the top row in the table above. This appears to boost precision a bit for non-TR/HP, and somewhat more substantially for TR/HP.

Will go back and regenerate some figures for the paper accordingly.

samuelklee · 2025-02-21T19:55:08Z

After clearing up some confusion (i.e., turns out the actual TRGT calls in the w/ TRGT run never made it to the downstream pipeline---I incorrectly assumed they were incorporated in the merged HiPhase SV outputs---but we see they affect the accuracy of the integrated SVs in TR regions anyway, presumably by negatively affecting their phasing, see also https://github.com/PacificBiosciences/HiPhase/blob/main/docs/user_guide.md#why-are-some-smallstructural-variants-unphased-when-i-added-tandem-repeats), here's the plan going forward (copied from Slack):

Hmm, let's be really clear about what needs to get run next:

truvari on raw TRGT w/ Matt's catalog
truvari on raw TRGT w/ Ben's catalog
truvari on integrated SV

All of the above on a single sample vs. dipcall in confident regions and perhaps in/out TRs.

Then, with the completed 1.3.0 + 1.1 + short + SV + TRGT run:

bcftools merge the TRGT HiPhase outputs
bcftools merge the merged SV and TRGT HiPhase outputs
ConcatAndEvaluate with short + (SV+TRGT)

Then, with the just completed 1.4.5 + 1.1 + short + TRGT run:

bcftools merge the TRGT HiPhase outputs (EDIT: added to table above)
ConcatAndEvaluate with short + TRGT (this can be directly compared to already evaluated 1.4.5 + 1.1 + short + SV runs) (EDIT: ChromosomePhasedPanelCreationFromHiPhase submitted and added to table above)

Also recall that these two HiPhase runs were done w/ Matt's catalog, so depending on what these experiments show we may need to repeats some with Ben's catalog. Please feel free to check and/or edit to link submissions. Thanks @hangsuUNC!

samuelklee · 2025-02-21T21:14:58Z

@hangsuUNC A step using a GATK tool in the ChromosomePhasedPanelCreationFromHiPhase run on 1.4.5 + 1.1 + short + TRGT failed due to:

htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 1194060: unparsable vcf record with allele TCATCTGGAAGCCCCTACTCCCACCTCACCACACACATGCACATCACCCCCCACACACACCAAACAMCCCACACAACACACACACACCACACCACACAAACACAAACACACCACATCATGAACACACACATCACACACACACCACACACCCCACACACCCCACACACATCACACACACACACCACACACCCCACACAACACACAACACACACCACACACACCACACCACACACACACCACACCCCACAACACACACACACCACACTCCCCACTGAACACACACACATCACACACACCACACACCCCACACACACCACACACCCCACACAACACACACCACACACACACCACACCCCACAACACACACACATCACATCACACACACCACACCCCACACACACACACCACACCACACACACACCACACACCCCACACAACACATACACACATCACACACACCCCACACACCAACACACCACATCACACACACACCACATCACACACACACCAAACACCCCACACAACACACACACAACACAACACACACACAAACACACCACATCATACACACACCACACACCCCACACACCACACACACATCACACACACACCCCACACACCCCACACACCCCACACCACACTACACACACACCACACACCACACACAACACACACAACACAACACACACACCCCACACACAAACACCCCACACACCACACACACCACATCACACACACACACCACATCACACCCCACACAACACACACACCACACCACACACACACACCACACACCCCACACAACACACACCGCATCACACACACACCACACAACACACACACACCACACACACCACACACCCCACACAACACACAACACATCACACACACCACACACACCACACAACACACACATCACATACACACCACACACCACACACAACACATACACATCACACACACACCACACACCACACACAACACACACAACACAACACACACAACACACACCCCACACAACACACCACATCACACACACATCACACAACACACAACACAACACACACATCACACACCCCACACAACACACCACATCACACAAAACACATCACACACACAACACACACCCCACACAACACACACAACACAACACACACACCACACACCCCACACAACACACACAACACACCACACATATACATCACACACCACACACCCCACGCAACACACACACCACATCACACACACACCACACACCCCACACACACACGCATCACACCACACACACCACAGCCCCCACACAACACATACACACCACATCACACACACACCACACATCACATGTCATACACAGCACATACACCACACACACCACATAACATCACATGTCACACACACATCACATGACACATACACCACACACCCCATGCATCACACACACACCACACATCACATGTCATACATACTACATACACACAACACACACAACACATAACATCACATGTCACACACATCACATGACACACACCACACACTCCACACATCACATACACACTGCACATACACTACATCACACACCACACACCACACATCACATGTCACAAACACCACACAGCACACCCCACACCACACACATACACCACATACACAAATACCACACCACACACCACACATACACCACACCACACACACTTCACACACACCACACATCACATGTCACACACATTAGGTACACACCGAACACACACAACACACATTAAATGCCACATACAACACACCACACACATTAAACACACACCCCACACATAAGTCACACACATCACACACACAACACACCATGCGCTAAATACATCACACATACTACACACATGCAAATCACATCACAGACACCACATACGCACCACACCACATACCACACACACGACACATCACATGCCACACATCACCTGTCACACACATCAAACACACACACAATACACCACACACCTCACACACGTCAAACAAACCCCACACACACCATATCACACACACATCACACACACCACACATGCACCATATGCCTCCACACACAGAGACACATACACATCACACACCCTCACACACACACACCCCACATGCCATTTATACCACATGCCACAAACATTACATGCA,

Maybe a bad allele in the merged TRGT file? I've got to run, but let me know if you get a chance to dig!

hangsuUNC · 2025-02-21T22:10:50Z

@hangsuUNC A step using a GATK tool in the ChromosomePhasedPanelCreationFromHiPhase run on 1.4.5 + 1.1 + short + TRGT failed due to:

htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 1194060: unparsable vcf record with allele TCATCTGGAAGCCCCTACTCCCACCTCACCACACACATGCACATCACCCCCCACACACACCAAACAMCCCACACAACACACACACACCACACCACACAAACACAAACACACCACATCATGAACACACACATCACACACACACCACACACCCCACACACCCCACACACATCACACACACACACCACACACCCCACACAACACACAACACACACCACACACACCACACCACACACACACCACACCCCACAACACACACACACCACACTCCCCACTGAACACACACACATCACACACACCACACACCCCACACACACCACACACCCCACACAACACACACCACACACACACCACACCCCACAACACACACACATCACATCACACACACCACACCCCACACACACACACCACACCACACACACACCACACACCCCACACAACACATACACACATCACACACACCCCACACACCAACACACCACATCACACACACACCACATCACACACACACCAAACACCCCACACAACACACACACAACACAACACACACACAAACACACCACATCATACACACACCACACACCCCACACACCACACACACATCACACACACACCCCACACACCCCACACACCCCACACCACACTACACACACACCACACACCACACACAACACACACAACACAACACACACACCCCACACACAAACACCCCACACACCACACACACCACATCACACACACACACCACATCACACCCCACACAACACACACACCACACCACACACACACACCACACACCCCACACAACACACACCGCATCACACACACACCACACAACACACACACACCACACACACCACACACCCCACACAACACACAACACATCACACACACCACACACACCACACAACACACACATCACATACACACCACACACCACACACAACACATACACATCACACACACACCACACACCACACACAACACACACAACACAACACACACAACACACACCCCACACAACACACCACATCACACACACATCACACAACACACAACACAACACACACATCACACACCCCACACAACACACCACATCACACAAAACACATCACACACACAACACACACCCCACACAACACACACAACACAACACACACACCACACACCCCACACAACACACACAACACACCACACATATACATCACACACCACACACCCCACGCAACACACACACCACATCACACACACACCACACACCCCACACACACACGCATCACACCACACACACCACAGCCCCCACACAACACATACACACCACATCACACACACACCACACATCACATGTCATACACAGCACATACACCACACACACCACATAACATCACATGTCACACACACATCACATGACACATACACCACACACCCCATGCATCACACACACACCACACATCACATGTCATACATACTACATACACACAACACACACAACACATAACATCACATGTCACACACATCACATGACACACACCACACACTCCACACATCACATACACACTGCACATACACTACATCACACACCACACACCACACATCACATGTCACAAACACCACACAGCACACCCCACACCACACACATACACCACATACACAAATACCACACCACACACCACACATACACCACACCACACACACTTCACACACACCACACATCACATGTCACACACATTAGGTACACACCGAACACACACAACACACATTAAATGCCACATACAACACACCACACACATTAAACACACACCCCACACATAAGTCACACACATCACACACACAACACACCATGCGCTAAATACATCACACATACTACACACATGCAAATCACATCACAGACACCACATACGCACCACACCACATACCACACACACGACACATCACATGCCACACATCACCTGTCACACACATCAAACACACACACAATACACCACACACCTCACACACGTCAAACAAACCCCACACACACCATATCACACACACATCACACACACCACACATGCACCATATGCCTCCACACACAGAGACACATACACATCACACACCCTCACACACACACACCCCACATGCCATTTATACCACATGCCACAAACATTACATGCA,

Maybe a bad allele in the merged TRGT file? I've got to run, but let me know if you get a chance to dig!

Yes, I believe there are a bunch of bad alleles in the TRGT file, for example, just found the raw call evaluation wdl failed because there is a "M" in the reference allele in the vcf file... Maybe we should do another round of filtering of TRGT calls first? Please let me know what do you think @samuelklee

samuelklee · 2025-02-21T22:28:15Z

Yes, I’d also see if the other catalog has this issue. Thanks!

hangsuUNC · 2025-02-21T22:30:16Z

Yes, I’d also see if the other catalog has this issue. Thanks!

Yes, have checked with the other catelog, seems like a common issue, they replace reference Ns with some other letters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To do list for the v1.1b panel #56

To do list for the v1.1b panel #56

hangsuUNC commented Jan 24, 2025 •

edited

Loading

hangsuUNC commented Feb 3, 2025

samuelklee commented Feb 4, 2025

hangsuUNC commented Feb 4, 2025 •

edited by samuelklee

Loading

samuelklee commented Feb 4, 2025 •

edited

Loading

samuelklee commented Feb 5, 2025 •

edited

Loading

samuelklee commented Feb 5, 2025 •

edited

Loading

hangsuUNC commented Feb 5, 2025

samuelklee commented Feb 6, 2025 •

edited

Loading

hangsuUNC commented Feb 11, 2025

samuelklee commented Feb 19, 2025 •

edited

Loading

samuelklee commented Feb 19, 2025

samuelklee commented Feb 20, 2025 •

edited

Loading

samuelklee commented Feb 21, 2025 •

edited by hangsuUNC

Loading

samuelklee commented Feb 21, 2025

hangsuUNC commented Feb 21, 2025 •

edited

Loading

samuelklee commented Feb 21, 2025

hangsuUNC commented Feb 21, 2025

To do list for the v1.1b panel #56

To do list for the v1.1b panel #56

Comments

hangsuUNC commented Jan 24, 2025 • edited Loading

hangsuUNC commented Feb 3, 2025

samuelklee commented Feb 4, 2025

hangsuUNC commented Feb 4, 2025 • edited by samuelklee Loading

samuelklee commented Feb 4, 2025 • edited Loading

samuelklee commented Feb 5, 2025 • edited Loading

samuelklee commented Feb 5, 2025 • edited Loading

hangsuUNC commented Feb 5, 2025

samuelklee commented Feb 6, 2025 • edited Loading

hangsuUNC commented Feb 11, 2025

samuelklee commented Feb 19, 2025 • edited Loading

samuelklee commented Feb 19, 2025

samuelklee commented Feb 20, 2025 • edited Loading

samuelklee commented Feb 21, 2025 • edited by hangsuUNC Loading

samuelklee commented Feb 21, 2025

hangsuUNC commented Feb 21, 2025 • edited Loading

samuelklee commented Feb 21, 2025

hangsuUNC commented Feb 21, 2025

hangsuUNC commented Jan 24, 2025 •

edited

Loading

hangsuUNC commented Feb 4, 2025 •

edited by samuelklee

Loading

samuelklee commented Feb 4, 2025 •

edited

Loading

samuelklee commented Feb 5, 2025 •

edited

Loading

samuelklee commented Feb 5, 2025 •

edited

Loading

samuelklee commented Feb 6, 2025 •

edited

Loading

samuelklee commented Feb 19, 2025 •

edited

Loading

samuelklee commented Feb 20, 2025 •

edited

Loading

samuelklee commented Feb 21, 2025 •

edited by hangsuUNC

Loading

hangsuUNC commented Feb 21, 2025 •

edited

Loading