Skip to content

vt estimate converts QUAL to '-nan' for multi-allelic sites #94

Open
@oleraj

Description

@oleraj

Hi,

I noticed recently that when I use vt estimate command, the QUAL score is convert to -nan for multi-allelic sites. This is fine with variants with single allele. An example variant is here:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	S1	S2	S3
21	10906293	rs28616746	T	C,G	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum	GT:AD:DP:GQ:PL	1/2:0,133,52:185:99:5668,1373,975,4289,0,4133	1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242	1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911

After vt estimate -e AB it looks like this:

21	10906293	rs28616746	T	C,G	-nan	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;HWEAF=0.5,0.5;HWEGF=0,0,0.25,0,0.5,0.25;MLEAF=0,1;AB=0.0985985	GT:AD:DP:GQ:PL	1/2:0,133,52:185:99:5668,1373,975,4289,0,4133	1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242	1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911

AB field is computed fine, but the QUAL is converted to -nan, which is not a valid value for QUAL column and this has issues when I try to run GATK VariantsToTable in a subsequent step.

Since I only saw this with multi-allelic variants, I thought maybe if I decomposed and normalized the variants first that might help. Decomposing and normalizing retains the QUAL score in tact:

bcftools view $vcf 21:10906293 | vt decompose - | vt normalize -r $ref - | grep -v "^#"
21	10906293	rs28616746	T	C	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G	GT:DP:PL	1/.:185:5668,1373,975	1/.:209:6136,1706,1270	1/.:262:8107,1791,1159
21	10906293	rs28616746	T	G	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G	GT:DP:PL	./1:185:5668,4289,4133	./1:209:6136,4431,4242	./1:262:8107,6111,5911

However, when I use that as input to vt estimate, the results are worse -- the QUAL score is still converted to either -0 or 5 and AB is now -nan:

21	10906293	rs28616746	T	C	-0	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan	GT:DP:PL	1/.:185:5668,1373,975	1/.:209:6136,1706,1270	1/.:262:8107,1791,1159
21	10906293	rs28616746	T	G	5	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan	GT:DP:PL	./1:185:5668,4289,4133	./1:209:6136,4431,4242	./1:262:8107,6111,5911

Any help you can provide or suggestions on how to avoid converting the QUAL score to invalid values?

Thanks!

Andrew

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions