Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

norm split multiallelics does not split SVLEN (v4.2 Number=.) #2371

Closed
davmlaw opened this issue Feb 27, 2025 · 5 comments
Closed

norm split multiallelics does not split SVLEN (v4.2 Number=.) #2371

davmlaw opened this issue Feb 27, 2025 · 5 comments

Comments

@davmlaw
Copy link

davmlaw commented Feb 27, 2025

<DEL>,<DUP> SVLEN=-240209,240209

Is split into:

<DEL> SVLEN=-240209,240209
<DUP> SVLEN=-240209,240209

I think it should be:

<DEL> SVLEN=-240209
<DUP> SVLEN=240209

Full example

Command line:

bcftools norm --multiallelics=- --old-rec-tag=BCFTOOLS_OLD_VARIANT multiallelic_svlen.vcf

Input file:

##fileformat=VCFv4.2
##source=DRAGEN_CNV
##reference=file://GRCh38_decoy2_with_cnv
##contig=<ID=chr1,length=248956422>
##ALT=<ID=DEL,Description="Region of lowered copy number relative to the reference, or a deletion breakpoint">
##ALT=<ID=DUP,Description="Region of elevated copy number relative to the reference, or a tandem duplication breakpoint">
##INFO=<ID=REFLEN,Number=1,Type=Integer,Description="Number of REF positions included in this record">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr1	1436723	DRAGEN:GAINLOH:chr1:1436724-1676932;DRAGEN:GAIN:chr1:1436724-1668557	N	<DEL>,<DUP>	12	PASS	SVLEN=-240209,240209;SVTYPE=CNV;END=1676932;REFLEN=240209

Output:

chr1	1436723	DRAGEN:GAINLOH:chr1:1436724-1676932;DRAGEN:GAIN:chr1:1436724-1668557	N	<DEL>	12	PASS	SVLEN=-240209,240209;SVTYPE=CNV;END=1676932;REFLEN=240209;BCFTOOLS_OLD_VARIANT=chr1|1436723|N|<DEL>,<DUP>|1
chr1	1436723	DRAGEN:GAINLOH:chr1:1436724-1676932;DRAGEN:GAIN:chr1:1436724-1668557	N	<DUP>	12	PASS	SVLEN=-240209,240209;SVTYPE=CNV;END=1676932;REFLEN=240209;BCFTOOLS_OLD_VARIANT=chr1|1436723|N|<DEL>,<DUP>|2

Version:

$ bcftools --version
bcftools 1.20
Using htslib 1.20

Workaround:

Post process VCF, use index at the end of old-rec-tag to pick which SVLEN to take

@pd3
Copy link
Member

pd3 commented Mar 5, 2025

The program splits fields declared as Number=A, otherwise there is no guarantee which value belongs to which allele. I see the VCF specification suggests Number=., but it is unclear why, as it also writes that it should have "one value for each ALT allele", so Number=A would make more sense.

As a quick workaround, the header can be modified with bcftools reheader. We may consider adding an exception for this tag, as here the specification is to blame, not the software which followed it

@davmlaw
Copy link
Author

davmlaw commented Mar 6, 2025

Thanks.

Looks like the spec was changed from . in 4.3 to A in 4.4

With Novaseq X having inbuilt Dragen, there will be a lot of these files.

@davmlaw davmlaw changed the title norm split multiallelics does not split SVLEN norm split multiallelics does not split SVLEN (v4.2 Number=.) Mar 6, 2025
@pd3
Copy link
Member

pd3 commented Mar 9, 2025

OK. Definitely the best solution is to use reheader, I don't think we want to be adding exceptions in bcftools since it is not the program's fault.

@pd3 pd3 closed this as completed Mar 9, 2025
@davmlaw
Copy link
Author

davmlaw commented Mar 16, 2025

I understand not wanting special case code to do something not technically right.

I think this may bite a lot of people so maybe as a compromise how about a warning for SVLEN Number = . for split multiallelic the first time multiple values are found?

@pd3 pd3 reopened this Mar 17, 2025
@pd3 pd3 closed this as completed in b7c19a1 Mar 26, 2025
@davmlaw
Copy link
Author

davmlaw commented Apr 1, 2025

Cheers @pd3 - slight typo in your change - "INFO/SVLE"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants