-
Notifications
You must be signed in to change notification settings - Fork 521
feat: support array_contains in LabelList scalar index #5681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: support array_contains in LabelList scalar index #5681
Conversation
|
PTAL @westonpace. |
|
Discovered a corner data correctness bug in the LABEL_LIST index and filed #5682 for follow-up. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for picking this up!
Will merge in a day or so in case you want to address any comments.
| # Include lists with NULL items to ensure NULL needle behavior matches | ||
| # non-index execution. | ||
| tbl = pa.table( | ||
| {"labels": [["foo", "bar"], ["bar"], ["baz"], ["qux", None], [None], []]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also include an entry where the entire list is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| // Do not push down NULL needles. | ||
| if scalar.is_null() { | ||
| return None; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because of #5682 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is independent of #5682. #5682 is about the index missing rows where a valid element and a NULL co-exists (e.g., searching for 'foo' misses ['foo', None]). I will provide more details in #5682 later.
I've added a comment to explain the check. And array_has_any and array_has_all don't have this semantic mismatch as they natively follow membership logic.
| } | ||
| match expr { | ||
| Expr::Between(between) => Ok(visit_between(between, index_info)), | ||
| Expr::Alias(alias) => visit_node(alias.expr.as_ref(), index_info, depth), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
| if args.len() != 2 { | ||
| return None; | ||
| } | ||
| if func.name() == "array_has" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work for array_contains? Is Datafusion mapping that to array_has already?
If so, can we add a comment here mentioning that this branch is also going to be hit for array_contains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment
|
The build-no-lock failure should have been fixed by #5690 |
04d8744 to
33558d5
Compare
|
PTAL @westonpace. |
closes #3687
Changes: