Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma propagation doesn't recognise substantia nigra as part of the brain and #842

Closed
oganm opened this issue Sep 15, 2023 · 17 comments
Closed
Assignees
Labels
Milestone

Comments

@oganm
Copy link
Member

oganm commented Sep 15, 2023

When using the term for brain (http://purl.obolibrary.org/obo/UBERON_000095) to fetch datasets from all brain regions, the propagation seems to be missing a few terms, most but not all of the missing items can be explained with missing terms in the propagated filter term the endpoint returns

Attempted query: https://gemma.msl.ubc.ca/rest/v2/datasets/?filter=allCharacteristics.valueUri%20%3D%20http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0000955

Missing results that should have been in there

@ppavlidis
Copy link
Collaborator

ppavlidis commented Sep 15, 2023

Thanks for pointing this out. Substantia nigra should definitely be included in "brain" as it is a child term.

image

But for GSE161045, UBERON_0002038 is not putamen, it's substantia nigra.

Putamen is http://purl.obolibrary.org/obo/UBERON_0001874

@oganm
Copy link
Member Author

oganm commented Sep 15, 2023

Yep. Pasted the wrong term. It is UBERON_0001874

@arteymix
Copy link
Member

arteymix commented Sep 15, 2023

The inference is working as expected:

image

Is it possible that a free-term is used in the dataset?

@arteymix arteymix added bug needs confirmation Issues that need to be confirmed or reliably reproduced labels Sep 15, 2023
@ppavlidis
Copy link
Collaborator

The inference is working as expected:

image

Is it possible that a free-term is used in the dataset?

Not for the examples Ogan gave. They are annotated with an ontology term that is a subclass of Brain.

image

@ppavlidis
Copy link
Collaborator

Confirmation in the Browser that something isn't working right, directly searching for a dataset mentioned while having "brain" checked yields:

image

@oganm
Copy link
Member Author

oganm commented Sep 15, 2023

The annotations do appear to be ontology terms and not free text. UBERON_0002038 does appear in the propagated filter despite missing the dataset annotated with it, similar to the putamen term. UBERON_0001965, UBERON_0002661 and UBERON_0001966 however do not appear in the propagated filter at all

https://gemma.msl.ubc.ca/rest/v2/datasets/GSE168496/annotations
https://gemma.msl.ubc.ca/rest/v2/datasets/GSE161045/annotations

@arteymix
Copy link
Member

Ok, substantia nigra is inferred, but not substantia nigra pars compacta which is a children as per the "part of" relation. I'll debug that...

@oganm
Copy link
Member Author

oganm commented Sep 15, 2023

just to make sure, while substantia nigra is inferred, the dataset annotated with it still isn't within the results

@arteymix
Copy link
Member

It looks like a bug in the inference code. I can reproduce this.

With Uberon base OWL, I get none of the terms and regular Uberon only returns "substantia nigra"

@arteymix arteymix removed the needs confirmation Issues that need to be confirmed or reliably reproduced label Sep 15, 2023
@arteymix arteymix added this to the 1.30.2 milestone Sep 15, 2023
@arteymix
Copy link
Member

This could be due to the fact we're only using the transitive inference instead of the full OWL inference.

@arteymix
Copy link
Member

Yeah, it looks like we need at least the mini OWL inference to capture "part of" relations:

image

@arteymix
Copy link
Member

I'll try using the micro OWL instead of the transitive inference.

@arteymix
Copy link
Member

arteymix commented Sep 18, 2023

I did some additional tests and it turns out that for our version of Jena, inference with owl:someValuesFrom is only support for the full reasoner.

I've introduced a workaround in baseCode to revisit parents/children that might have been missed due to limited inference capabilities. PavlidisLab/baseCode@09d7b20

We won't need to use a newer version of Jena for now as per PavlidisLab/baseCode#36.

I'll push the fix on the development branch for testing.

arteymix added a commit that referenced this issue Sep 18, 2023
Update baseCode to 1.1.19 which adds support for setting language levels
and inference mode.

Allow language level and inference mode to be fine-tuned in
OntologyServiceFactory. The combination of the two has a significant
incidence on loading time, so one must carefully set them.

Add support for setting additional property URIs to be used for
inferring parents and children.

Add tests to ensure that "substantia nigra pars compacta" is inferred
from "brain".
@oganm
Copy link
Member Author

oganm commented Sep 19, 2023

None of the examples above are returned with the 'allCharacteristics.valueUri = http://purl.obolibrary.org/obo/UBERON_0000955' in the development version as of now

@arteymix
Copy link
Member

I can confirm that both "substantia nigra" and "substantia nigra pars compacta" are inferred from brain.

I will investigate why those specific datasets are not coming up.

@arteymix
Copy link
Member

I see! The tag is on the sample itself, but we only index the FactorValue of the BioMaterial. This is utterly useless because those factors are already declared in the experimental design which we add separately.

@arteymix
Copy link
Member

Fully fixed in 570a7c6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants