-
Couldn't load subscription status.
- Fork 1k
Support more operations on ListView #8645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added support for ListView and LargeListView for the following operations: * `arrow_select::concat` * `arrow_select::filter_array` Signed-off-by: Andrew Duffy <[email protected]>
There was a bug in ArrayData validation for ListView, which became apparent when you tried to call `list_view.to_data().into_builder().build().unwrap()`. The range for checking the offsets/sizes was wrong and would trivially trigger an out of bounds check. Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
| ) -> Result<(), ArrowError> { | ||
| let offsets: &[T] = self.typed_buffer(0, self.len)?; | ||
| let sizes: &[T] = self.typed_buffer(1, self.len)?; | ||
| for i in 0..values_length { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug before. You can verify this by construction a list_view_array and then doing list_view_array.to_data().into_builder().build().unwrap() and it will panic, because values_length is the length of the inner values not of the list itself.
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
0c069d2 to
124b437
Compare
Signed-off-by: Andrew Duffy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @a10y 🙏
I went through the code carefully, and it all looks good to me
BTW if anyone is interested, here is the relevant spec portion: https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout
I think PR needs a few more tests but then it will be ready
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
|
Thanks for reviewing @alamb! I had a few questions which I left open, but otherwise your comments have been addressed. |
|
Code looks great -- just tests for eq and I think this will be good to merge |
Signed-off-by: Andrew Duffy <[email protected]>
|
@alamb just pushed a commit with some tests, let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @a10y
arrow-data/src/equal/list_view.rs
Outdated
| for ((&lhs_offset, &rhs_offset), &size) in lhs_range_offsets | ||
| .iter() | ||
| .zip(rhs_range_offsets) | ||
| .zip(lhs_sizes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be lhs_range_sizes ?
The earlier iterators are **_range_** ones, i.e. they take into account the lhs_start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you're right, good catch
arrow-data/src/equal/list_view.rs
Outdated
| for (index, ((&lhs_offset, &rhs_offset), &size)) in lhs_range_offsets | ||
| .iter() | ||
| .zip(rhs_range_offsets) | ||
| .zip(lhs_sizes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Signed-off-by: Andrew Duffy <[email protected]>
|
Thank you for reviewing @martin-g , I'm trying to craft a good test case for the bug you found. Seems like lhs_offset is only non-zero when it's nested in a dict, list, or ree encoding |
Signed-off-by: Andrew Duffy <[email protected]>
Which issue does this PR close?
Part of #5375
Vortex was encountering some issues after we switched our preferred List type to
ListView, the first thing we noticed was thatarrow_select::filter_arraywould fail on ListView (and LargeListView, though we don't use that).This PR addresses some missing select kernel implementations for ListView and LargeListView.
This also fixes an existing bug in the ArrayData validation for ListView arrays that would trigger an out of bounds index panic.
Are these changes tested?
Are there any user-facing changes?
ListView/LargeListView can now be used with the
take,concatandfilter_arraykernelsYou can now use the
PartialEqto compare ListView arrays.