-
Couldn't load subscription status.
- Fork 1k
fix: ArrayIter does not report size hint correctly after advancing from the iterator back
#8728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: ArrayIter does not report size hint correctly after advancing from the iterator back
#8728
Conversation
…from the iterator back this also adds a LOT of tests extracted from (which is how I found that bug): - apache#8697
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rluvaton -- the code change looks good to me. I started going through the tests, but got lost in the maze of generics / structs and traits -- it will take me a while to get through this.
Is there any way you can make it easier to understand what is going on in these tests to make it faster to review?
| ( | ||
| self.array.len() - self.current, | ||
| Some(self.array.len() - self.current), | ||
| self.current_end - self.current, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual code fix, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
| type CallTrackingOnly = CallTrackingWithInputType<()>; | ||
|
|
||
| #[test] | ||
| fn assert_position() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that this test fails without the code change in this PR
---- iterator::tests::assert_position stdout ----
thread 'iterator::tests::assert_position' panicked at arrow-array/src/iterator.rs:467:13:
assertion `left == right` failed: Failed on op rposition with 0 false returned for new iter after consuming 1 element from the end (left actual, right expected) ([Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7), Some(8)])
left: AdapterOutput { value: CallTrackingAndResult { result: Some(9), calls: [Some(8)] }, leftover: [Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7)] }
right: AdapterOutput { value: CallTrackingAndResult { result: Some(8), calls: [Some(8)] }, leftover: [Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7)] }
arrow-array/src/iterator.rs
Outdated
| /// under various consumption patterns (e.g. some calls to next/next_back/consume_all/etc) | ||
| fn assert_array_iterator_cases<O: ConsumingArrayIteratorOp>(o: O) { | ||
| setup_and_assert_cases(NoSetup, |actual, expected| { | ||
| let current_iterator_values: Vec<Option<i32>> = expected.clone().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert code that gets the iterator values and compares them is repeated many times in these tests and makes it hard to follow what they are doing -- can you please find some way to reduce the duplication ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference is the error message, so I moved the assertion to the setup and assert function and added a description for the setup
|
Cleaned up a code a bit. the idea is to create an infra to make the tests themself only about the relavent thing: so if I take this test for example #[test]
fn assert_for_each() {
// 1. the operation that we want to apply
struct ForEachOp;
impl ConsumingArrayIteratorOp for ForEachOp {
type Output = CallTrackingOnly;
fn name(&self) -> String {
"for_each".to_string()
}
// apply the operation and get result to compare
fn get_value<T: SharedBetweenArrayIterAndSliceIter>(&self, iter: T) -> Self::Output {
let mut items = Vec::with_capacity(iter.len());
// We are testing for_each so the thing we want to assert is the function arguments (and the order of the items)
iter.for_each(|item| {
items.push(item);
});
// Return the data we are asserting on
CallTrackingAndResult {
// we pass the function calls to assert that they are the same as the source of truth iterator (slice in our case)
calls: items,
// this function does not have a return value so we pass ()
result: (),
}
}
}
assert_array_iterator_cases(ForEachOp)
} |
|
and this test example where the iterator is mutated rather than consumed: #[test]
fn assert_any() {
struct AnyOp {
false_count: usize,
}
// This is a mutating array iterator as the operation we do (any) does not consume
// the iterator so we also need to assert that we leave the iterator in a valid state
impl MutatingArrayIteratorOp for AnyOp {
type Output = CallTrackingWithInputType<bool>;
fn name(&self) -> String {
format!("any with {} false returned", self.false_count)
}
fn get_value<T: SharedBetweenArrayIterAndSliceIter>(
&self,
iter: &mut T,
) -> Self::Output {
// track the cb calls
let mut items = Vec::with_capacity(iter.len());
// track the number of false we returned from the any callback
let mut count = 0;
// save the any result to make sure we return the same value
let res = iter.any(|item| {
// in any we also want to track the callback calls are the same, so we save that as well
items.push(item);
// Allow for different amount of false to be returned before true
if count < self.false_count {
count += 1;
false
} else {
true
}
});
// Return the data we assert on
CallTrackingWithInputType {
// we test that we get called with the same arguments
calls: items,
// We also test that the returned value from any is correct
result: res,
}
}
}
// we want to test both when we find the value in the 1st call (always true - false count is 0)
// when we find after 2nd and 3rd calls
// and when we never find
for false_count in [0, 1, 2, usize::MAX] {
// assert the operation on many cases
// use the _mutate as it will assert that the iterator state after the operator finish is correct
// and we don't leave it in bad state
assert_array_iterator_cases_mutate(AnyOp { false_count });
}
} |
|
I can add those comments to the code and move the helpers down so the first thing people will see is the test with explanation and then they have some idea about the implementation |
Which issue does this PR close?
N/A
Rationale for this change
for the fix: the array iterator is marked as exact size iterator and double ended iterator so it should report the current length when accessed through the other side
What changes are included in this PR?
fix by using
current_endinstead ofarray.len()and also adds a LOT of tests extracted from (which is how I found that bug):
ArrayIterwith dedicated null/non-nullable versions #8697Are these changes tested?
Yes
Are there any user-facing changes?
Kinda