Skip to content

Conversation

@rluvaton
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

for the fix: the array iterator is marked as exact size iterator and double ended iterator so it should report the current length when accessed through the other side

What changes are included in this PR?

fix by using current_end instead of array.len()
and also adds a LOT of tests extracted from (which is how I found that bug):

Are these changes tested?

Yes

Are there any user-facing changes?

Kinda

…from the iterator back

this also adds a LOT of tests extracted from (which is how I found that bug):
- apache#8697
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 27, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rluvaton -- the code change looks good to me. I started going through the tests, but got lost in the maze of generics / structs and traits -- it will take me a while to get through this.

Is there any way you can make it easier to understand what is going on in these tests to make it faster to review?

(
self.array.len() - self.current,
Some(self.array.len() - self.current),
self.current_end - self.current,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the actual code fix, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

type CallTrackingOnly = CallTrackingWithInputType<()>;

#[test]
fn assert_position() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this test fails without the code change in this PR

---- iterator::tests::assert_position stdout ----

thread 'iterator::tests::assert_position' panicked at arrow-array/src/iterator.rs:467:13:
assertion `left == right` failed: Failed on op rposition with 0 false returned for new iter after consuming 1 element from the end (left actual, right expected) ([Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7), Some(8)])
  left: AdapterOutput { value: CallTrackingAndResult { result: Some(9), calls: [Some(8)] }, leftover: [Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7)] }
 right: AdapterOutput { value: CallTrackingAndResult { result: Some(8), calls: [Some(8)] }, leftover: [Some(0), Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7)] }

/// under various consumption patterns (e.g. some calls to next/next_back/consume_all/etc)
fn assert_array_iterator_cases<O: ConsumingArrayIteratorOp>(o: O) {
setup_and_assert_cases(NoSetup, |actual, expected| {
let current_iterator_values: Vec<Option<i32>> = expected.clone().collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert code that gets the iterator values and compares them is repeated many times in these tests and makes it hard to follow what they are doing -- can you please find some way to reduce the duplication ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is the error message, so I moved the assertion to the setup and assert function and added a description for the setup

@rluvaton
Copy link
Member Author

Cleaned up a code a bit.

the idea is to create an infra to make the tests themself only about the relavent thing:

so if I take this test for example

#[test]
fn assert_for_each() {
	// 1. the operation that we want to apply 
    struct ForEachOp;

    impl ConsumingArrayIteratorOp for ForEachOp {
        type Output = CallTrackingOnly;

        fn name(&self) -> String {
            "for_each".to_string()
        }

		// apply the operation and get result to compare
        fn get_value<T: SharedBetweenArrayIterAndSliceIter>(&self, iter: T) -> Self::Output {
            let mut items = Vec::with_capacity(iter.len());

			// We are testing for_each so the thing we want to assert is the function arguments (and the order of the items)
            iter.for_each(|item| {
                items.push(item);
            });

			// Return the data we are asserting on
            CallTrackingAndResult {
				// we pass the function calls to assert that they are the same as the source of truth iterator (slice in our case)
                calls: items,
				// this function does not have a return value so we pass ()
                result: (),
            }
        }
    }

    assert_array_iterator_cases(ForEachOp)
}

@rluvaton
Copy link
Member Author

rluvaton commented Oct 28, 2025

and this test example where the iterator is mutated rather than consumed:

#[test]
fn assert_any() {
    struct AnyOp {
        false_count: usize,
    }

	// This is a mutating array iterator as the operation we do (any) does not consume
	// the iterator so we also need to assert that we leave the iterator in a valid state
    impl MutatingArrayIteratorOp for AnyOp {
        type Output = CallTrackingWithInputType<bool>;

        fn name(&self) -> String {
            format!("any with {} false returned", self.false_count)
        }

        fn get_value<T: SharedBetweenArrayIterAndSliceIter>(
            &self,
            iter: &mut T,
        ) -> Self::Output {
			// track the cb calls
            let mut items = Vec::with_capacity(iter.len());

			// track the number of false we returned from the any callback
            let mut count = 0;

			// save the any result to make sure we return the same value
            let res = iter.any(|item| {

				// in any we also want to track the callback calls are the same, so we save that as well 
                items.push(item);

				// Allow for different amount of false to be returned before true
                if count < self.false_count {
                    count += 1;
                    false
                } else {
                    true
                }
            });

			// Return the data we assert on
            CallTrackingWithInputType {
				// we test that we get called with the same arguments
                calls: items,
				// We also test that the returned value from any is correct
                result: res,
            }
        }
    }

	// we want to test both when we find the value in the 1st call (always true - false count is 0)
    // when we find after 2nd and 3rd calls
	// and when we never find
    for false_count in [0, 1, 2, usize::MAX] {
		// assert the operation on many cases
		// use the _mutate as it will assert that the iterator state after the operator finish is correct
		// and we don't leave it in bad state
        assert_array_iterator_cases_mutate(AnyOp { false_count });
    }
}

@rluvaton
Copy link
Member Author

I can add those comments to the code and move the helpers down so the first thing people will see is the test with explanation and then they have some idea about the implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants