(apparently) good protein groups in combined.prot.xml are not in protein.tsv #379
Replies: 7 comments 4 replies
-
weird that some of the protein group i pasted dissappeared. Attached is a snippet from combined.prot.xml that shows the same protein group |
Beta Was this translation helpful? Give feedback.
-
Hi Ben,
The probability threshold printed is not based on the proteinprophet probability
<protein_group group_number="152" probability="1.0000">
Instead, it is based on the probability of the best pest peptide. Also, only unique or razor peptides are considered. So it is possibly that this protein has several good peptides, but they all were reassigned to another protein (with more unique peptides). And only a few peptides are used as evidence for this particular one, none passing the 0.9073 threshold.
Can you rerun the philosopher filter step with --prot 1 (meaning 100% protein FDR) and see if there are any PSMs for this protein in PSM.tsv, and what probabilities they have?
Alexey
From: collins-ben ***@***.***>
Sent: Monday, September 5, 2022 10:32 AM
To: Nesvilab/philosopher ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [Nesvilab/philosopher] (apparently) good protein groups in combined.prot.xml are not in protein.tsv (Discussion #379)
External Email - Use Caution
Hi Felipe and the Philosophers,
I am struggling to find some proteins that are present in combined.protein.xml above the reported threshold but missing from protein.tsv
For examle, if I have this protein group in combined.prot.xml:
<protein_group group_number="152" probability="1.0000">
<modification_info modified_peptide="MPVLTLPALSK[136]">
<mod_aminoacid_mass position="11" mass="136.109161"/>
</modification_info>
<modification_info modified_peptide="PVLTLPALSK[136]">
<mod_aminoacid_mass position="10" mass="136.109161"/>
</modification_info>
<modification_info modified_peptide="PVLTLPALSK[136]P">
<mod_aminoacid_mass position="10" mass="136.109161"/>
</modification_info>
<modification_info modified_peptide="PVLTLPALSK[136]PVLTSPVLKMPVLVS">
<mod_aminoacid_mass position="10" mass="136.109161"/>
</modification_info>
</protein_group>
log_2022-09-05_13-15-40.txt<https://github.com/Nesvilab/philosopher/files/9490508/log_2022-09-05_13-15-40.txt>
and the log says this protein group should be above theshold (full log attached):
INFO[13:15:34] Converged to 1.06 % FDR with 473 Proteins decoy=5 threshold=0.9073 total=478
But then this protein ('entry_5817') does not appear in protein.tsv. Is the threshold stated in the log the same as protein probability stated in combined.prot.xml
One question I have is, is there something happening during this step that could be filtering this protein group? Are there some options in Philosopher Filter to consider here?
INFO[13:15:39] Total report numbers after FDR filtering, and post-processing ions=938 peptides=822 proteins=411 psms=5238
Relevant notes are that this is a quite large proteogenomics database with funny looking accessions (i.e. entry_5817). There are also some heavy peptides thrown in (as seen above in prot.xml excerpt).
In case having some further files would be helpful let me know and I am happy to send.
Thansk a lot for any advice
Best
Ben
log_2022-09-05_13-15-40.txt<https://github.com/Nesvilab/philosopher/files/9490594/log_2022-09-05_13-15-40.txt>
—
Reply to this email directly, view it on GitHub<#379>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIIMM65TBMDAUJFA7PNMUZTV4X77BANCNFSM6AAAAAAQFBUSKA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
|
Beta Was this translation helpful? Give feedback.
-
Ok, from your second email it looks like what I asked is not relevant
I do not see alternative_protein for those peptides (so theyw uld not be reassigned)
I do see that the peptide have
n_enzymatic_termini="1"
Did you do a semi-tryptic search?
I think we would need to have your data to figure it out with Felipe what removed those peptides. Can you send us pep.xml. prot.xml files, and the sequence database.
Alexey
From: collins-ben ***@***.***>
Sent: Monday, September 5, 2022 10:35 AM
To: Nesvilab/philosopher ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [Nesvilab/philosopher] (apparently) good protein groups in combined.prot.xml are not in protein.tsv (Discussion #379)
External Email - Use Caution
weird that some of the protein group i pasted dissappeared. Attached is a snippet from combined.prot.xml that shows the same protein group
example_protein_group.txt<https://github.com/Nesvilab/philosopher/files/9490613/example_protein_group.txt>
—
Reply to this email directly, view it on GitHub<#379 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIIMM6YKGMUUT2FJFUNA4PTV4YAJDANCNFSM6AAAAAAQFBUSKA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
|
Beta Was this translation helpful? Give feedback.
-
Many thanks Alexey for taking a look at this. The files you mentioned are
here:
https://qubstudentcloud-my.sharepoint.com/:f:/g/personal/3054755_ads_qub_ac_uk/EgbZxNKnGRZDvMu4gYoTjvwBfQV--ZYB3d9cFVuGboRCYA?e=eepzkg
Yes, this should be semi-tryptic search
…On Mon, Sep 5, 2022 at 5:03 PM Alexey Nesvizhskii ***@***.***> wrote:
Ok, from your second email it looks like what I asked is not relevant
I do not see alternative_protein for those peptides (so theyw uld not be
reassigned)
I do see that the peptide have
n_enzymatic_termini="1"
Did you do a semi-tryptic search?
I think we would need to have your data to figure it out with Felipe what
removed those peptides. Can you send us pep.xml. prot.xml files, and the
sequence database.
Alexey
From: collins-ben ***@***.***>
Sent: Monday, September 5, 2022 10:35 AM
To: Nesvilab/philosopher ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [Nesvilab/philosopher] (apparently) good protein groups in
combined.prot.xml are not in protein.tsv (Discussion #379)
External Email - Use Caution
weird that some of the protein group i pasted dissappeared. Attached is a
snippet from combined.prot.xml that shows the same protein group
example_protein_group.txt<
https://github.com/Nesvilab/philosopher/files/9490613/example_protein_group.txt>
—
Reply to this email directly, view it on GitHub<
#379 (comment)>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AIIMM6YKGMUUT2FJFUNA4PTV4YAJDANCNFSM6AAAAAAQFBUSKA>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.******@***.***>>
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not
be used for urgent or sensitive issues
—
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADTIKLOENC5ZKUEVCA7O5BLV4YKUFANCNFSM6AAAAAAQFBUSKA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Ben. I can investigate this for you, but before, I have a couple of questions;
|
Beta Was this translation helpful? Give feedback.
-
Hi Felipe
Thanks a lot.
1. these are technical replicate injections of the same sample
2. because we want to know the aggregated protein identifications (is this
not right?)
3. trypsin - I see now that somehow the name selected in the params got
mixed up but it seems like the definition is correct for trypsin (right?)
4. there is always a chance ;-) however, the point is that the scores for
those peptides in the interact.pep.xml files seems to be higher than what
the log says for the score thresholds at the given FDRs. Same for protein
level it seems that the score for the highest peptide in this group is
higher than the stated score threshold in the log. So, I see that neither
the PSMs or the Protein group arrive in the table but it seems to me that
they should be given the scores for these. Do you agree? i.e. if you just
ran the last steps of philosopher on the files I sent I guess you could
replicate that?
If it helps I can try to reprocess these in a clean folder to rule out some
issues with repeated analysis. Let me know if there is anything else I
should try.
Many thanks for taking a look!
…On Tue, Sep 6, 2022 at 2:47 PM Felipe da Veiga Leprevost < ***@***.***> wrote:
Some follow-up questions:
1. What enzyme(s) did you use with these samples?
2. Is there a chance you re-processed them using different parameters
and got the files mixed up? None of the peptides from group 152 are present
in the PSM table.
—
Reply to this email directly, view it on GitHub
<#379 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADTIKLIATTYO43LVTR3O2ILV45DOJANCNFSM6AAAAAAQFBUSKA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Good. We will release a philosopher update soon.
Best
Alexey
From: collins-ben ***@***.***>
Sent: Thursday, September 8, 2022 10:35 AM
To: Nesvilab/philosopher ***@***.***>
Cc: Nesvizhskii, Alexey ***@***.***>; Comment ***@***.***>
Subject: Re: [Nesvilab/philosopher] (apparently) good protein groups in combined.prot.xml are not in protein.tsv (Discussion #379)
External Email - Use Caution
Fantastic! Thanks.
Yes, I can confirm that we can find the missing proteins either by
* using your version 4.5.1-RC11
* or independly changing the header string of the fasta to something more sensible (i.e. no subsumed accessions)
Now the nr of protein IDs in the log matches the nr in the protein.tsv.
So, it seems all is well in the world again! Thanks a lot for tracking this down so quickly.
All the best
Ben
—
Reply to this email directly, view it on GitHub<#379 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIIMM64WLOHVXRNWDKS4Q5DV5H2QBANCNFSM6AAAAAAQFBUSKA>.
You are receiving this because you commented.Message ID: ***@***.******@***.***>>
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
|
Beta Was this translation helpful? Give feedback.
-
Hi Felipe and the Philosophers,
I am struggling to find some proteins that are present in combined.protein.xml above the reported threshold but missing from protein.tsv
For examle, if I have this protein group in combined.prot.xml:
<protein_group group_number="152" probability="1.0000">
<modification_info modified_peptide="MPVLTLPALSK[136]">
<mod_aminoacid_mass position="11" mass="136.109161"/>
</modification_info>
<modification_info modified_peptide="PVLTLPALSK[136]">
<mod_aminoacid_mass position="10" mass="136.109161"/>
</modification_info>
<modification_info modified_peptide="PVLTLPALSK[136]P">
<mod_aminoacid_mass position="10" mass="136.109161"/>
</modification_info>
<modification_info modified_peptide="PVLTLPALSK[136]PVLTSPVLKMPVLVS">
<mod_aminoacid_mass position="10" mass="136.109161"/>
</modification_info>
</protein_group>
log_2022-09-05_13-15-40.txt
and the log says this protein group should be above theshold (full log attached):
INFO[13:15:34] Converged to 1.06 % FDR with 473 Proteins decoy=5 threshold=0.9073 total=478
But then this protein ('entry_5817') does not appear in protein.tsv. Is the threshold stated in the log the same as protein probability stated in combined.prot.xml
One question I have is, is there something happening during this step that could be filtering this protein group? Are there some options in Philosopher Filter to consider here?
INFO[13:15:39] Total report numbers after FDR filtering, and post-processing ions=938 peptides=822 proteins=411 psms=5238
Relevant notes are that this is a quite large proteogenomics database with funny looking accessions (i.e. entry_5817). There are also some heavy peptides thrown in (as seen above in prot.xml excerpt).
In case having some further files would be helpful let me know and I am happy to send.
Thansk a lot for any advice
Best
Ben
log_2022-09-05_13-15-40.txt
Beta Was this translation helpful? Give feedback.
All reactions