Description
Hi,
I noticed recently that when I use vt estimate
command, the QUAL score is convert to -nan
for multi-allelic sites. This is fine with variants with single allele. An example variant is here:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2 S3
21 10906293 rs28616746 T C,G 141379 PASS AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum GT:AD:DP:GQ:PL 1/2:0,133,52:185:99:5668,1373,975,4289,0,4133 1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242 1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911
After vt estimate -e AB
it looks like this:
21 10906293 rs28616746 T C,G -nan PASS AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;HWEAF=0.5,0.5;HWEGF=0,0,0.25,0,0.5,0.25;MLEAF=0,1;AB=0.0985985 GT:AD:DP:GQ:PL 1/2:0,133,52:185:99:5668,1373,975,4289,0,4133 1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242 1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911
AB field is computed fine, but the QUAL is converted to -nan
, which is not a valid value for QUAL column and this has issues when I try to run GATK VariantsToTable in a subsequent step.
Since I only saw this with multi-allelic variants, I thought maybe if I decomposed and normalized the variants first that might help. Decomposing and normalizing retains the QUAL score in tact:
bcftools view $vcf 21:10906293 | vt decompose - | vt normalize -r $ref - | grep -v "^#"
21 10906293 rs28616746 T C 141379 PASS AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G GT:DP:PL 1/.:185:5668,1373,975 1/.:209:6136,1706,1270 1/.:262:8107,1791,1159
21 10906293 rs28616746 T G 141379 PASS AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G GT:DP:PL ./1:185:5668,4289,4133 ./1:209:6136,4431,4242 ./1:262:8107,6111,5911
However, when I use that as input to vt estimate
, the results are worse -- the QUAL score is still converted to either -0
or 5
and AB is now -nan
:
21 10906293 rs28616746 T C -0 PASS AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan GT:DP:PL 1/.:185:5668,1373,975 1/.:209:6136,1706,1270 1/.:262:8107,1791,1159
21 10906293 rs28616746 T G 5 PASS AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan GT:DP:PL ./1:185:5668,4289,4133 ./1:209:6136,4431,4242 ./1:262:8107,6111,5911
Any help you can provide or suggestions on how to avoid converting the QUAL score to invalid values?
Thanks!
Andrew