fix: `ArrayIter` does not report size hint correctly after advancing from the iterator back #8728

rluvaton · 2025-10-27T22:00:41Z

Which issue does this PR close?

N/A

Rationale for this change

for the fix: the array iterator is marked as exact size iterator and double ended iterator so it should report the current length when accessed through the other side

What changes are included in this PR?

fix by using current_end instead of array.len()
and also adds a LOT of tests extracted from (which is how I found that bug):

perf: override default implementation in ArrayIter with dedicated null/non-nullable versions #8697

Are these changes tested?

Yes

Are there any user-facing changes?

Kinda

…from the iterator back this also adds a LOT of tests extracted from (which is how I found that bug): - apache#8697

alamb

Thanks @rluvaton -- the code change looks good to me. I started going through the tests, but got lost in the maze of generics / structs and traits -- it will take me a while to get through this.

Is there any way you can make it easier to understand what is going on in these tests to make it faster to review?

alamb · 2025-10-28T14:57:30Z

arrow-array/src/iterator.rs

        (
-            self.array.len() - self.current,
-            Some(self.array.len() - self.current),
+            self.current_end - self.current,


this is the actual code fix, right?

alamb · 2025-10-28T20:37:58Z

arrow-array/src/iterator.rs

+    type CallTrackingOnly = CallTrackingWithInputType<()>;
+
+    #[test]
+    fn assert_position() {


I verified that this test fails without the code change in this PR

---- iterator::tests::assert_position stdout ---- thread 'iterator::tests::assert_position' panicked at arrow-array/src/iterator.rs:467:13: assertion `left == right` failed: Failed on op rposition with 0 false returned for new iter after consuming 1 element from the end (left actual, right expected) ([Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7), Some(8)]) left: AdapterOutput { value: CallTrackingAndResult { result: Some(9), calls: [Some(8)] }, leftover: [Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7)] } right: AdapterOutput { value: CallTrackingAndResult { result: Some(8), calls: [Some(8)] }, leftover: [Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7)] }

alamb · 2025-10-28T20:40:30Z

arrow-array/src/iterator.rs

+    /// under various consumption patterns (e.g. some calls to next/next_back/consume_all/etc)
+    fn assert_array_iterator_cases<O: ConsumingArrayIteratorOp>(o: O) {
+        setup_and_assert_cases(NoSetup, |actual, expected| {
+            let current_iterator_values: Vec<Option<i32>> = expected.clone().collect();


This assert code that gets the iterator values and compares them is repeated many times in these tests and makes it hard to follow what they are doing -- can you please find some way to reduce the duplication ?

The only difference is the error message, so I moved the assertion to the setup and assert function and added a description for the setup

rluvaton · 2025-10-28T21:19:50Z

Cleaned up a code a bit.

the idea is to create an infra to make the tests themself only about the relavent thing:

so if I take this test for example

#[test]
fn assert_for_each() {
	// 1. the operation that we want to apply 
    struct ForEachOp;

    impl ConsumingArrayIteratorOp for ForEachOp {
        type Output = CallTrackingOnly;

        fn name(&self) -> String {
            "for_each".to_string()
        }

		// apply the operation and get result to compare
        fn get_value<T: SharedBetweenArrayIterAndSliceIter>(&self, iter: T) -> Self::Output {
            let mut items = Vec::with_capacity(iter.len());

			// We are testing for_each so the thing we want to assert is the function arguments (and the order of the items)
            iter.for_each(|item| {
                items.push(item);
            });

			// Return the data we are asserting on
            CallTrackingAndResult {
				// we pass the function calls to assert that they are the same as the source of truth iterator (slice in our case)
                calls: items,
				// this function does not have a return value so we pass ()
                result: (),
            }
        }
    }

    assert_array_iterator_cases(ForEachOp)
}

rluvaton · 2025-10-28T21:25:13Z

and this test example where the iterator is mutated rather than consumed:

#[test]
fn assert_any() {
    struct AnyOp {
        false_count: usize,
    }

	// This is a mutating array iterator as the operation we do (any) does not consume
	// the iterator so we also need to assert that we leave the iterator in a valid state
    impl MutatingArrayIteratorOp for AnyOp {
        type Output = CallTrackingWithInputType<bool>;

        fn name(&self) -> String {
            format!("any with {} false returned", self.false_count)
        }

        fn get_value<T: SharedBetweenArrayIterAndSliceIter>(
            &self,
            iter: &mut T,
        ) -> Self::Output {
			// track the cb calls
            let mut items = Vec::with_capacity(iter.len());

			// track the number of false we returned from the any callback
            let mut count = 0;

			// save the any result to make sure we return the same value
            let res = iter.any(|item| {

				// in any we also want to track the callback calls are the same, so we save that as well 
                items.push(item);

				// Allow for different amount of false to be returned before true
                if count < self.false_count {
                    count += 1;
                    false
                } else {
                    true
                }
            });

			// Return the data we assert on
            CallTrackingWithInputType {
				// we test that we get called with the same arguments
                calls: items,
				// We also test that the returned value from any is correct
                result: res,
            }
        }
    }

	// we want to test both when we find the value in the 1st call (always true - false count is 0)
    // when we find after 2nd and 3rd calls
	// and when we never find
    for false_count in [0, 1, 2, usize::MAX] {
		// assert the operation on many cases
		// use the _mutate as it will assert that the iterator state after the operator finish is correct
		// and we don't leave it in bad state
        assert_array_iterator_cases_mutate(AnyOp { false_count });
    }
}

rluvaton · 2025-10-28T21:29:06Z

I can add those comments to the code and move the helpers down so the first thing people will see is the test with explanation and then they have some idea about the implementation

fix: ArrayIter does not report size hint correctly after advancing …

dd98964

…from the iterator back this also adds a LOT of tests extracted from (which is how I found that bug): - apache#8697

github-actions bot added the arrow Changes to the arrow crate label Oct 27, 2025

cleanup tests

ae5e961

rluvaton mentioned this pull request Oct 27, 2025

perf: override default implementation in ArrayIter with dedicated null/non-nullable versions #8697

Open

alamb reviewed Oct 28, 2025

View reviewed changes

rluvaton added 4 commits October 28, 2025 22:55

share assertion

7f8f788

add more comments

60b1317

reduce duplication

f36bbbd

cleanup

72904f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: `ArrayIter` does not report size hint correctly after advancing from the iterator back #8728

fix: `ArrayIter` does not report size hint correctly after advancing from the iterator back #8728

Uh oh!

rluvaton commented Oct 27, 2025

Uh oh!

alamb left a comment •

edited

Loading

Uh oh!

alamb Oct 28, 2025

Uh oh!

rluvaton Oct 28, 2025

Uh oh!

alamb Oct 28, 2025

Uh oh!

alamb Oct 28, 2025

Uh oh!

rluvaton Oct 28, 2025

Uh oh!

rluvaton commented Oct 28, 2025

Uh oh!

rluvaton commented Oct 28, 2025 •

edited

Loading

Uh oh!

rluvaton commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: ArrayIter does not report size hint correctly after advancing from the iterator back #8728

Are you sure you want to change the base?

fix: ArrayIter does not report size hint correctly after advancing from the iterator back #8728

Uh oh!

Conversation

rluvaton commented Oct 27, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton commented Oct 28, 2025

Uh oh!

rluvaton commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rluvaton commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: `ArrayIter` does not report size hint correctly after advancing from the iterator back #8728

fix: `ArrayIter` does not report size hint correctly after advancing from the iterator back #8728

alamb left a comment •

edited

Loading

rluvaton commented Oct 28, 2025 •

edited

Loading