CGD-2145 - Provide additional CCDS transcripts #212
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Somebody noticed a variant imported into CGD that doesn't have any CCDS transcripts even though it should.
It looks like there is something we can do to get more CCDS transcripts sent to CGD but it requires that we lower or standards for data quality.
We originally decided that when multiple versions of the same refseq accession are found for a variant, we choose just one of them to be sent to CGD.
Example:
For 22-46929555-C-T,
Annovar comes up with NM_014246.3
And hgvs/uta comes up with NM_014246.4, NM_014246.3, and NM_014246.1.
So NM_014246.3 is the one with the most information and is what gets sent to CGD.
But for whatever reason NM_014246.3 doesn't map to a CCDS, while the other two (.1 and .4) do.
The way things have been working is that the best transcript (NM_014246.3) was selected and then we checked to see if there was a corresponding CCDS for it. And since there isn't then only NM_014246.3 makes it into the tfx vcf.
Now, NM_014246.3 is still the only refseq transcript sent to CGD. But, tx_eff_hgvs has been changed so it realises that NM_014246.3 doesn't have a CCDS. And it looks at the other two versions of the accession and makes a copy of NM_014246.4 and changes the accession to CCDS14076.1. So we end up with NM_014246.3 and CCDS14076.1 being sent to cgd.
The downside to this solution is that there is a reason we discarded NM_014246.4 in the first place. It isn't known to annovar, so it is missing the fields that annovar populates; it's incomplete. But if we lower our standards and accept it, then we can have more ccds transcripts in CGD.