Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter parameter and inconsistent outputs #602

Closed
oganm opened this issue Mar 16, 2023 · 12 comments
Closed

Filter parameter and inconsistent outputs #602

oganm opened this issue Mar 16, 2023 · 12 comments
Assignees
Labels
bug high priority Issues that require immediate attention
Milestone

Comments

@oganm
Copy link
Member

oganm commented Mar 16, 2023

Filter terms separated by and should return more and more specialised results as more terms are added but this doesn't seem to be the case for reasons unknown.

I am using the presence of the dataset 549 as the indicator in the examples below, but results are consistent with examining the entire output via offset

Case 1

Case 2

Dataset 549 doesn't have the term UBERON_0000955 or any of the associated terms that it propagates to looking at the filter term in the outputs. The closest thing it has is UBERON_0002038 which is a region of the brain. It does have the other term DOID_14330

@arteymix
Copy link
Member

Propagation is indeed broken, I filed a separate issue for this: #603.

I'll investigate why the conjunction of characteristics filters don't work.

@arteymix arteymix added this to the 1.30.0 milestone Mar 17, 2023
@arteymix arteymix added the bug label Mar 17, 2023
@arteymix arteymix self-assigned this Mar 17, 2023
@oganm
Copy link
Member Author

oganm commented Apr 21, 2023

This is the issue we talked about today. The first example case above now returns 549. The second example case above now returns... nothing. I suspect the second case is because DOID isn't loaded on dev.

@oganm
Copy link
Member Author

oganm commented Jun 3, 2023

Popping back to confirm that this issue still appear to exist albeit in a different form so I am adding up to date examples and with a list of grievances below along with the assumptions I am making when expecting the results in case there is a mismatch there

Case 1, incomplete inheritance

Case 2, failure to return overlap of two queries of allCharacteristics.valueUri or allCharacteristics.value when "and" is used

Case 3 duplicated results when querying for ids.

@arteymix
Copy link
Member

arteymix commented Jun 3, 2023

Just to be clear, if you submit multiple filter parameters, the last one take precedence which is consistent with just performing the second query.

The and return no result because of how the SQL is generated. I do a jointure on the characteristics so having multiple conjunctive clauses on the same attribute will not work.

I need to adjust these queries to use subqueries instead of jointures.

@arteymix arteymix added the high priority Issues that require immediate attention label Jun 3, 2023
@oganm
Copy link
Member Author

oganm commented Jun 3, 2023

if you submit multiple filter parameters

None of the examples do this do they? Or are we talking about using "and"?

If it's case 3 those are separate calls

@arteymix
Copy link
Member

arteymix commented Jun 5, 2023

I've investigated this, and it turns out that resolving this require significant work.

At the very fundamental level, we cannot express a conjunction on a one-to-many relation with a single jointure. The conjunction will actually turn into a contradiction.

c.id = 1 and c.id = 2

which is why you get zero results. Instead, we need to either create one jointure per clause

c1.id = 1 and c2.id = 2

or use subqueries

id in (select ... join c on c.id = 1) and id in (select ... join c on c.id = 2)

I think I will ultimately opt for expressing those using subqueries since it appears to be the simplest and most flexible approach.

This is being looked at in #708.

arteymix added a commit that referenced this issue Jun 6, 2023
@arteymix
Copy link
Member

arteymix commented Jun 6, 2023

@arteymix arteymix reopened this Jun 6, 2023
@arteymix
Copy link
Member

arteymix commented Jun 6, 2023

How curious that results appear duplicated when filtered by ID. I'll investigate that...

@arteymix
Copy link
Member

arteymix commented Jun 6, 2023

I know what's going on: the ACL entries are joined in the query, but it's lacking a group by because we deemed it unnecessary. I'll add an extra check for that.

@arteymix
Copy link
Member

arteymix commented Jun 6, 2023

That only leaves the propagation. I'll investigate it now.

@oganm
Copy link
Member Author

oganm commented Jun 9, 2023

The propagation still seems a bit incomplete.

Using the annotation/search/datasets endpoint I ran a search for http://purl.obolibrary.org/obo/UBERON_0000955. This returns 3950 results

Whereas datasets endpoint by filter returns 1796 results.

These calls aren't exactly equivalent so I examined some of the missing cases. Experiment 18 is such an example. It is annotated by frontal cortex and frontal lobe both of which should be children of brain (http://purl.obolibrary.org/obo/UBERON_0000955) yet the experiment isn't returned.

@arteymix
Copy link
Member

arteymix commented Jun 9, 2023

These two endpoints should return exactly the same thing. I'm deploying the fix for #729 so you can test it on the dev.

Otherwise that means that the inference is behaving differently from prod to dev. This might be expected because we use slimmer ontologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug high priority Issues that require immediate attention
Projects
None yet
Development

No branches or pull requests

2 participants