8645: Support more operations on ListView by martin-augment · Pull Request #8 · martin-augment/arrow-rs

martin-augment · 2025-10-24T13:02:59Z

8645: To review by AI

Summary by CodeRabbit

Release Notes

New Features
- Added comprehensive support for ListView and LargeListView array types across concatenation, filtering, element selection, and equality operations.
- Implemented validation checks for offsets and sizes buffer consistency during array processing.
Tests
- Added extensive test coverage for ListView array operations including slicing, reordering, and null handling.

Added support for ListView and LargeListView for the following operations: * `arrow_select::concat` * `arrow_select::filter_array` Signed-off-by: Andrew Duffy <[email protected]>

There was a bug in ArrayData validation for ListView, which became apparent when you tried to call `list_view.to_data().into_builder().build().unwrap()`. The range for checking the offsets/sizes was wrong and would trivially trigger an out of bounds check. Signed-off-by: Andrew Duffy <[email protected]>

Signed-off-by: Andrew Duffy <[email protected]>

coderabbitai · 2025-10-24T13:03:33Z

Walkthrough

The pull request adds comprehensive support for ListView and LargeListView data types across Arrow's data handling pipeline. Changes include pre-loop validation for offset/size consistency, equality comparison logic, transform operations (extend and null handling), concatenation, filtering, and selection (take) kernels. New modules encapsulate ListView-specific logic with parametrized i32/i64 offset types. Extensive test coverage validates operations on list-view arrays.

Changes

Cohort / File(s)	Summary
Data validation `arrow-data/src/data.rs`	Enhanced `validate_offsets_and_sizes` with pre-loop validation to detect mismatched offsets and sizes buffer lengths, routing to early `ComputeError` before per-element bounds checking.
Equality support `arrow-data/src/equal/list_view.rs`, `arrow-data/src/equal/mod.rs`	New `list_view.rs` module introduces generic `list_view_equal<T>` function for comparing ListView/LargeListView arrays considering nulls and offsets. Module wired into `equal/mod.rs` dispatch with i32/i64 variants replacing prior unimplemented stubs.
Transform operations `arrow-data/src/transform/list_view.rs`, `arrow-data/src/transform/mod.rs`	New `transform/list_view.rs` module provides generic `build_extend` and `extend_nulls` functions for ListView/LargeListView offset/size buffer manipulation. Integrated into `transform/mod.rs` for capacity allocation and buffer handling with i32/i64 offset type dispatch.
Selection operations `arrow-select/src/concat.rs`, `arrow-select/src/filter.rs`, `arrow-select/src/take.rs`	Added `concat_list_view`, `filter_list_view`, and `take_list_view` helpers to support ListView/LargeListView concatenation, filtering, and selection. Each module routes i32/i64 offset types appropriately; includes comprehensive test coverage for standard and sliced variants.
Test coverage `arrow/tests/array_equal.rs`	New test helpers (`create_list_view_array`) and test functions (`test_list_view_array`, `test_large_list_view_array`) validate equality semantics, reordering, and slicing for ListView-based arrays.

Sequence Diagram

sequenceDiagram
    participant User
    participant Arrow as Arrow API
    participant Valid as Validation
    participant Equal as Equality
    participant Trans as Transform
    participant Select as Selection
    
    rect rgb(240, 248, 255)
    Note over User,Select: ListView/LargeListView Operation Flow
    end
    
    User->>Arrow: Create/Manipulate ListView Array
    activate Arrow
    
    rect rgb(255, 245, 238)
    Note over Valid: 1. Validate Phase
    Arrow->>Valid: validate_offsets_and_sizes()
    Valid->>Valid: Check offsets.len() == sizes.len()
    Valid-->>Valid: ComputeError if mismatch
    Valid->>Valid: Per-element bounds check
    Valid-->>Arrow: Valid or Error
    end
    
    alt Equality Check
        rect rgb(240, 255, 240)
        Note over Equal: 2. Equality Phase
        Arrow->>Equal: equal_values (lhs, rhs)
        Equal->>Equal: list_view_equal<T>()
        Equal->>Equal: Compare offsets & sizes
        Equal->>Equal: Compare child data
        Equal-->>Arrow: bool result
        end
    else Transformation
        rect rgb(255, 250, 240)
        Note over Trans: 3. Transform Phase
        Arrow->>Trans: build_extend / extend_nulls
        Trans->>Trans: Copy offsets & sizes buffers
        Trans->>Trans: Append or extend nulls
        Trans-->>Arrow: Mutable buffers ready
        end
    else Selection
        rect rgb(245, 245, 255)
        Note over Select: 4. Selection Phase
        Arrow->>Select: concat / filter / take
        Select->>Select: Process offsets & sizes
        Select->>Select: Rebuild child data view
        Select-->>Arrow: New ListView Array
        end
    end
    
    deactivate Arrow
    Arrow-->>User: Result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes introduce ListView/LargeListView support across nine files with consistent pattern application (generic i32/i64 offset dispatch) but heterogeneous operations (validation, equality, transform, concat, filter, take). While individual file changes are straightforward, the breadth requires tracing implementations across modules and verifying correctness of new generic functions with proper trait bounds and buffer handling.

Poem

🐰 Lists now have views, both large and small,
Offsets and sizes aligned through it all,
From concat to take, through filter and equal,
ListView support makes the Arrow wheel roll!
With i32 and i64 holding the way,
Views of our lists leap and play! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is significantly incomplete compared to the required template. The provided description "8645: To review by AI" appears to be a placeholder and lacks all required sections: it does not specify which issue this closes, provide rationale for the changes, summarize what changes are included, indicate whether tests are present, or address any user-facing changes. The description provides no meaningful information beyond a vague reference to issue 8645, failing to meet the repository's documentation standards.	The description must be substantially revised to follow the template structure. It should include a "Closes #" statement linking to the relevant issue, explain why these ListView operations are being added, provide a summary of the changes across the multiple files (equality, filtering, concatenation, taking, and transformation support), confirm that comprehensive tests have been added (as evidenced by the test changes), and clarify any user-facing API additions or breaking changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "8645: Support more operations on ListView" clearly and accurately reflects the primary focus of this pull request. The changeset spans multiple files across arrow-data, arrow-select, and arrow modules to enable operations like equality comparison, filtering, concatenation, taking, and transformation on ListView and LargeListView types. The title is concise, specific, and would be readily understood by reviewers scanning the commit history without being overly vague or containing distracting elements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-8645-2025-10-24-13-02-56

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode

Review completed. 2 suggestions posted.

augmentcode · 2025-10-24T13:06:08Z

arrow-data/src/equal/list_view.rs

+        for ((&lhs_offset, &rhs_offset), &size) in lhs_range_offsets
+            .iter()
+            .zip(rhs_range_offsets)
+            .zip(lhs_sizes)


In this loop you zip lhs_range_offsets with rhs_range_offsets and then with lhs_sizes, but lhs_sizes is the full buffer, not the sliced lhs_range_sizes. When lhs_start != 0 this will read sizes from the wrong indices and can produce incorrect equality results. Consider using the sliced sizes here (also applies to the similar loop in the else branch).

🤖 React with 👍 or 👎 to let us know if the comment was useful.

value:useful; category:bug; feedback:The AI agent is correct that the zipped iterators might start from the wrong initial element with lhs_sizes. The third iterator should be lhs_range_sizes to use the same sub-slice of the iterator as the other two.

augmentcode · 2025-10-24T13:06:08Z

arrow-data/src/equal/list_view.rs

+            return false;
+        }
+
+        for (index, ((&lhs_offset, &rhs_offset), &size)) in lhs_range_offsets


In the null-handling path, for non-null entries the per-element sizes of lhs and rhs should be equal before comparing values. Currently the loop uses only the lhs size when calling equal_values, which could mark two lists equal if rhs has extra values beyond the lhs size.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

value:annoying; category:bug; feedback:The length of the range offsets is compared earlier and it they do not match the equal_values() is not called at all. See https://github.com/martin-augment/arrow-rs/pull/8/files/d0da6c59482632690ff08dfd0c29d1f9de33c19f#diff-ba36cb5edf354d7753dceb65d0744b1bddf1ceb2ae1ef16b3417c83bc8d78aeeR92

claude · 2025-10-24T13:06:31Z

Code Review: ListView/LargeListView Operations Support

This PR implements support for ListView and LargeListView operations across the arrow-rs codebase, replacing unimplemented!() panics with full functionality.

Critical Issue Found

Location: arrow-data/src/equal/list_view.rs:71 and line 107

The code zips with lhs_sizes (the full buffer) instead of lhs_range_sizes (the sliced range). This will read wrong size values when arrays are sliced with an offset.

Fix needed: Change .zip(lhs_sizes) to .zip(lhs_range_sizes) on both lines 71 and 107.

Code Quality

Strengths:

Consistent with existing List/LargeList patterns
Proper code organization and licensing
Good use of Rust generics
Excellent test coverage including concat, filter, take, and equality operations
Tests cover both i32 and i64 offset types, slicing, and null handling

Areas for Improvement:

Code duplication between null and non-null pathways in list_view_equal
Misleading comment at line 84 (says non-null pathway but is nullable pathway)
Multiple uses of unwrap() without error messages - consider using expect() with descriptive messages

Performance & Security

Good:

Zero-copy design properly exploited
Early returns for null count mismatches
No major security issues
Rust memory safety guarantees apply

Minor Concerns:

Potential integer overflow in concat operations on 32-bit systems
Some redundant length checks could be removed or documented

Test Coverage

Excellent coverage with tests for:

Basic and sliced concatenation
Filtering with various predicates
Take operations with reversed, null, and sliced inputs
Equality checks with reordering

Suggested additions:

Empty ListView arrays
All-null ListView arrays
Edge cases for overflow

Overall Assessment

This is a solid implementation following established patterns with comprehensive tests. However, the critical buffer indexing bug must be fixed before merging. Once fixed and the high-priority items addressed, this will be ready to merge.

Recommendation: Approve with required changes

github-actions · 2025-10-24T13:10:27Z

arrow-data/src/equal/list_view.rs:68 – The inner loop zips the sliced offsets with lhs_sizes, which is the entire buffer, instead of the sliced range. When lhs_start (or rhs_start) is non-zero—e.g. comparing slices of a ListView—this pulls the wrong lengths (often the size of index 0) and can make equal_values run with size == 0, returning true even when the sliced lists differ. The same issue is present in the null-aware branch at line 104. Please zip against lhs_range_sizes (and rhs_range_sizes if needed) so the per-row lengths correspond to the rows being compared.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (8)

arrow-data/src/equal/list_view.rs (1)

73-76: Avoid unwrap on to_usize to prevent panic on invalid data.

Return false if offsets/sizes can’t convert to usize; equality shouldn’t panic.

You can adopt the match-based conversion pattern shown in the previous diffs.

Also applies to: 117-120

arrow-select/src/concat.rs (1)

238-253: Optional: avoid concatenating unused child values for sliced inputs.

Current approach concatenates entire values() of each input ListView, even parts never referenced by offsets/sizes. You can compute per-array min(offset)..max(offset+size) and slice values to reduce memory.

Happy to sketch a helper to derive minimal value ranges per input.
arrow-data/src/data.rs (2)
983-990: Remove redundant length check (always equal here).

typed_buffer(…, self.len) yields slices of identical length; this equality check is unreachable and adds noise. Drop it. Also prefer consistent InvalidArgumentError in this validator.
-        if offsets.len() != sizes.len() {
-            return Err(ArrowError::ComputeError(format!(
-                "ListView offsets len {} does not match sizes len {}",
-                offsets.len(),
-                sizes.len()
-            )));
-        }
1004-1013: Avoid panic on overflow; return an ArrowError.

expect can abort the process for adversarial data. Propagate a typed error instead.
-            if size
-                .checked_add(offset)
-                .expect("Offset and size have exceeded the usize boundary")
-                > values_length
-            {
+            let end = size.checked_add(offset).ok_or_else(|| {
+                ArrowError::InvalidArgumentError(format!(
+                    "Offset + size overflow at index {} for {}",
+                    i, self.data_type
+                ))
+            })?;
+            if end > values_length {
                 return Err(ArrowError::InvalidArgumentError(format!(
                     "Size {} at index {} is larger than the remaining values for {}",
                     size, i, self.data_type
                 )));
             }
arrow/tests/array_equal.rs (2)

765-782: Helper is fine; consider reserving capacity.

Optionally use GenericListViewBuilder::<O,_>::with_capacity(Int32Builder::new(), data_size_hint) to reduce reallocs when data size is known.

784-823: Solid coverage; consider renaming for clarity.

Rename test_test_list_view_array to test_list_view_eq_take_and_slice for readability. No functional changes needed.
arrow-data/src/transform/list_view.rs (2)
24-47: Remove unused trait bound; consider slice copy.

CheckedAdd isn’t used; drop it from the bound.

For large ranges, prefer extend_from_slice(&offsets[start..start+len]) and same for sizes to reduce push overhead.
-pub(super) fn build_extend<T: ArrowNativeType + Integer + CheckedAdd>(
+pub(super) fn build_extend<T: ArrowNativeType + Integer>(
@@
-            for &offset in &offsets[start..start + len] {
-                offset_buffer.push(offset);
-            }
+            offset_buffer.extend_from_slice(&offsets[start..start + len]);
@@
-            for &size in &sizes[start..start + len] {
-                sizes_buffer.push(size);
-            }
+            sizes_buffer.extend_from_slice(&sizes[start..start + len]);
49-56: Nulls: push loops ok; minor perf nit.

You can reserve_exact(len) once and use extend(std::iter::repeat(T::default()).take(len)) for both buffers.
 pub(super) fn extend_nulls<T: ArrowNativeType>(mutable: &mut _MutableArrayData, len: usize) {
     let offset_buffer = &mut mutable.buffer1;
     let sizes_buffer = &mut mutable.buffer2;
-    // We push 0 as a placeholder for NULL values in both the offsets and sizes
-    (0..len).for_each(|_| offset_buffer.push(T::default()));
-    (0..len).for_each(|_| sizes_buffer.push(T::default()));
+    offset_buffer.reserve_exact(len);
+    sizes_buffer.reserve_exact(len);
+    offset_buffer.extend(std::iter::repeat(T::default()).take(len));
+    sizes_buffer.extend(std::iter::repeat(T::default()).take(len));
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b3b262 and d0da6c5.

📒 Files selected for processing (9)

arrow-data/src/data.rs (1 hunks)
arrow-data/src/equal/list_view.rs (1 hunks)
arrow-data/src/equal/mod.rs (3 hunks)
arrow-data/src/transform/list_view.rs (1 hunks)
arrow-data/src/transform/mod.rs (6 hunks)
arrow-select/src/concat.rs (11 hunks)
arrow-select/src/filter.rs (3 hunks)
arrow-select/src/take.rs (5 hunks)
arrow/tests/array_equal.rs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (9)

arrow-data/src/data.rs (2)

arrow-array/src/array/list_array.rs (2)

offsets (311-313)

len (549-551)

arrow-array/src/array/list_view_array.rs (2)

offsets (262-264)

sizes (277-279)

arrow-data/src/equal/list_view.rs (2)

arrow-data/src/data.rs (3)

count_nulls (51-62)

len (402-404)

len (1942-1945)

arrow-data/src/equal/mod.rs (4)

equal (161-166)

equal_values (62-129)

list_view_equal (109-109)

list_view_equal (110-110)

arrow-select/src/take.rs (4)

arrow-data/src/data.rs (5)

new (1857-1859)

new (1919-1932)

nulls (390-392)

nulls (1948-1953)

slice (550-578)

arrow-array/src/cast.rs (1)

as_list_view (927-929)

arrow-array/src/array/list_view_array.rs (6)

nulls (416-418)

to_data (379-381)

x (1010-1010)

x (1045-1045)

x (1080-1080)

x (1119-1119)

arrow-select/src/concat.rs (4)

x (106-106)

x (157-157)

x (220-220)

x (357-357)

arrow-select/src/concat.rs (5)

arrow-array/src/array/list_array.rs (7)

x (1269-1269)

nulls (569-571)

new (266-273)

values (322-324)

offsets (311-313)

offset (565-567)

try_new (209-259)

arrow-array/src/array/list_view_array.rs (8)

x (1010-1010)

x (1045-1045)

x (1080-1080)

x (1119-1119)

nulls (416-418)

sizes (277-279)

offsets (262-264)

offset (412-414)

arrow-buffer/src/buffer/offset.rs (3)

new (68-79)

offsets (242-250)

as_ref (208-210)

arrow-array/src/builder/generic_list_view_builder.rs (3)

values (119-121)

with_capacity (85-96)

array (443-443)

arrow-buffer/src/buffer/scalar.rs (1)

as_ref (144-146)

arrow-select/src/filter.rs (2)

arrow-data/src/data.rs (10)

new (1857-1859)

new (1919-1932)

nulls (390-392)

nulls (1948-1953)

null_count (420-425)

null_count (1956-1959)

buffer (586-588)

buffer (922-922)

buffer (1496-1496)

new_unchecked (273-300)

arrow-array/src/array/list_view_array.rs (2)

nulls (416-418)

to_data (379-381)

arrow-data/src/transform/list_view.rs (2)

arrow-data/src/transform/mod.rs (5)

build_extend (238-283)

array (220-220)

new (401-403)

len (753-755)

extend_nulls (742-749)

arrow-array/src/array/list_view_array.rs (1)

sizes (277-279)

arrow/tests/array_equal.rs (2)

arrow-select/src/take.rs (3)

take (86-102)

data (1026-1028)

indices (702-713)

arrow-array/src/array/list_view_array.rs (3)

Buffer (1097-1097)

test_list_view_array (586-619)

test_large_list_view_array (622-655)

arrow-data/src/equal/mod.rs (2)

arrow-data/src/equal/run.rs (1)

run_equal (26-86)

arrow-data/src/equal/list_view.rs (5)

list_view_equal (24-129)

lhs (31-31)

lhs (32-32)

rhs (34-34)

rhs (35-35)

arrow-data/src/transform/mod.rs (1)

arrow-data/src/transform/list_view.rs (4)

build_extend (24-47)

array (27-27)

array (28-28)

extend_nulls (49-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)

GitHub Check: Build wasm32
GitHub Check: PySpark Integration Test (stable)
GitHub Check: Test
GitHub Check: Clippy
GitHub Check: Check Compilation
GitHub Check: Pyarrow C Data Interface (stable, 17)
GitHub Check: Pyarrow C Data Interface (stable, 15)
GitHub Check: Pyarrow C Data Interface (stable, 16)
GitHub Check: MIRI
GitHub Check: Archery test With other arrows
GitHub Check: Clippy
GitHub Check: Verify MSRV (Minimum Supported Rust Version)
GitHub Check: codex
GitHub Check: Test on Mac
GitHub Check: Rustdocs are clean
GitHub Check: Test
GitHub Check: Test
GitHub Check: Test on Windows
GitHub Check: claude-review

🔇 Additional comments (7)

arrow-select/src/take.rs (1)

630-655: LGTM: take_list_view reuses child values and preserves semantics.

Offsets/sizes are taken correctly; nulls handled; child data reused. Looks solid.

arrow-data/src/transform/mod.rs (1)

270-272: ListView transform likely drops child values (child_data not extended).

build_extend delegates to list_view::build_extend but that only pushes offsets/sizes; no child_data writes occur. freeze() later uses data.child_data[0].freeze(), which will be empty if never extended. Kernels relying on MutableArrayData (e.g., filter) could produce ListView arrays with empty values.

Please verify ListView paths that use MutableArrayData (filter, fallback concat, etc.). If broken, fix either:

Update arrow-data/src/transform/list_view.rs build_extend to also extend child_data[0] with the source values (copy approach), or

Teach into_builder for ListView to reuse the input child ArrayData(s) when possible (like Dictionary/ByteView do), or

Provide dedicated filter/take/concat kernels for ListView and avoid MutableArrayData for it altogether.

Run to locate potential call sites:

Also applies to: 317-319, 792-799

arrow-select/src/filter.rs (2)

349-355: ListView branches wired correctly.

Type-directed as_list_view::<i32/i64>() inference is fine and keeps the dispatch symmetric with other types.

869-895: Implementation looks good; preserve child by reference.

Offsets/sizes/nulls are filtered as expected and child values are reused. No functional issues spotted.

Please run existing fuzz tests with ListView enabled in CI to exercise mixed null/run selectivity cases (already covered for other types).

arrow/tests/array_equal.rs (2)

25-31: Imports for ListView tests are appropriate.

825-833: Wrappers for i32/i64 offset variants look good.

arrow-data/src/equal/mod.rs (1)

33-47: Wiring for ListView equality added correctly.

coderabbitai · 2025-10-24T13:16:10Z

arrow-data/src/equal/list_view.rs

+        for ((&lhs_offset, &rhs_offset), &size) in lhs_range_offsets
+            .iter()
+            .zip(rhs_range_offsets)
+            .zip(lhs_sizes)
+        {
+            let lhs_offset = lhs_offset.to_usize().unwrap();
+            let rhs_offset = rhs_offset.to_usize().unwrap();
+            let size = size.to_usize().unwrap();
+
+            // Check if offsets are valid for the given range
+            if !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) {
+                return false;
+            }
+        }


⚠️ Potential issue | 🔴 Critical

Fix wrong zip and use per-range sizes to avoid mismatches on slices.

You’re zipping offsets with the full lhs_sizes buffer, not the sliced range. This misaligns sizes when lhs_start > 0. Replace the loop with index-based access over the sliced ranges.

Apply:

- for ((&lhs_offset, &rhs_offset), &size) in lhs_range_offsets - .iter() - .zip(rhs_range_offsets) - .zip(lhs_sizes) - { - let lhs_offset = lhs_offset.to_usize().unwrap(); - let rhs_offset = rhs_offset.to_usize().unwrap(); - let size = size.to_usize().unwrap(); - - // Check if offsets are valid for the given range - if !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) { - return false; - } - } + for i in 0..len { + let lhs_offset = match lhs_range_offsets[i].to_usize() { + Some(v) => v, + None => return false, + }; + let rhs_offset = match rhs_range_offsets[i].to_usize() { + Some(v) => v, + None => return false, + }; + let size = match lhs_range_sizes[i].to_usize() { + Some(v) => v, + None => return false, + }; + if !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) { + return false; + } + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In arrow-data/src/equal/list_view.rs around lines 68 to 81, the loop zips lhs_range_offsets and rhs_range_offsets with the entire lhs_sizes buffer, causing size mismatches when lhs_start > 0; replace the zipped iteration with an index-based loop over the sliced ranges (use the length of the range_offsets slice), and for each index read lhs_range_offsets[i], rhs_range_offsets[i], and the corresponding per-range size from the sliced lhs_sizes (or the correct offset into lhs_sizes) converting each to usize and then call equal_values with those values; ensure you iterate exactly over the number of ranges so offsets and sizes stay aligned.

coderabbitai · 2025-10-24T13:16:10Z

arrow-data/src/equal/list_view.rs

+        for (index, ((&lhs_offset, &rhs_offset), &size)) in lhs_range_offsets
+            .iter()
+            .zip(rhs_range_offsets)
+            .zip(lhs_sizes)
+            .enumerate()
+        {
+            let lhs_is_null = lhs_nulls.is_null(index);
+            let rhs_is_null = rhs_nulls.is_null(index);
+
+            if lhs_is_null != rhs_is_null {
+                return false;
+            }
+
+            let lhs_offset = lhs_offset.to_usize().unwrap();
+            let rhs_offset = rhs_offset.to_usize().unwrap();
+            let size = size.to_usize().unwrap();
+
+            // Check if values match in the range
+            if !lhs_is_null && !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) {
+                return false;
+            }
+        }


⚠️ Potential issue | 🔴 Critical

Nullable path: enforce per-index size equality and fix the same zip bug.

When both sides are non-null you must first check sizes[i] equality; otherwise differing lengths can compare equal by truncation. Also switch to the sliced size range.

Apply:

- for (index, ((&lhs_offset, &rhs_offset), &size)) in lhs_range_offsets - .iter() - .zip(rhs_range_offsets) - .zip(lhs_sizes) - .enumerate() - { - let lhs_is_null = lhs_nulls.is_null(index); - let rhs_is_null = rhs_nulls.is_null(index); - - if lhs_is_null != rhs_is_null { - return false; - } - - let lhs_offset = lhs_offset.to_usize().unwrap(); - let rhs_offset = rhs_offset.to_usize().unwrap(); - let size = size.to_usize().unwrap(); - - // Check if values match in the range - if !lhs_is_null && !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) { - return false; - } - } + for index in 0..len { + let lhs_is_null = lhs_nulls.is_null(index); + let rhs_is_null = rhs_nulls.is_null(index); + if lhs_is_null != rhs_is_null { + return false; + } + if !lhs_is_null { + if lhs_range_sizes[index] != rhs_range_sizes[index] { + return false; + } + let lhs_offset = match lhs_range_offsets[index].to_usize() { + Some(v) => v, + None => return false, + }; + let rhs_offset = match rhs_range_offsets[index].to_usize() { + Some(v) => v, + None => return false, + }; + let size = match lhs_range_sizes[index].to_usize() { + Some(v) => v, + None => return false, + }; + if !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) { + return false; + } + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (index, ((&lhs_offset, &rhs_offset), &size)) in lhs_range_offsets

.iter()

.zip(rhs_range_offsets)

.zip(lhs_sizes)

.enumerate()

{

let lhs_is_null = lhs_nulls.is_null(index);

let rhs_is_null = rhs_nulls.is_null(index);

if lhs_is_null != rhs_is_null {

return false;

}

let lhs_offset = lhs_offset.to_usize().unwrap();

let rhs_offset = rhs_offset.to_usize().unwrap();

let size = size.to_usize().unwrap();

// Check if values match in the range

if !lhs_is_null && !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) {

return false;

}

}

for index in 0..len {

let lhs_is_null = lhs_nulls.is_null(index);

let rhs_is_null = rhs_nulls.is_null(index);

if lhs_is_null != rhs_is_null {

return false;

}

if !lhs_is_null {

if lhs_range_sizes[index] != rhs_range_sizes[index] {

return false;

}

let lhs_offset = match lhs_range_offsets[index].to_usize() {

Some(v) => v,

None => return false,

};

let rhs_offset = match rhs_range_offsets[index].to_usize() {

Some(v) => v,

None => return false,

};

let size = match lhs_range_sizes[index].to_usize() {

Some(v) => v,

None => return false,

};

if !equal_values(lhs_data, rhs_data, lhs_offset, rhs_offset, size) {

return false;

}

}

}

🤖 Prompt for AI Agents

In arrow-data/src/equal/list_view.rs around lines 104 to 125, the nullable branch currently can compare lists of different lengths as equal and also uses the wrong zip (causing misaligned size indexing); fix by iterating over the three aligned iterators (lhs_range_offsets, rhs_range_offsets, and the sliced sizes iterator) so each index gets its corresponding size, then for each index when both sides are non-null first compare sizes[i] for equality and return false if they differ, and only if sizes match call equal_values with the per-index offsets and that size; ensure you use the correctly sliced sizes iterator (not the whole lhs_sizes misaligned with zips) so indices line up.

coderabbitai · 2025-10-24T13:16:10Z

arrow-data/src/equal/mod.rs

+        DataType::ListView(_) => list_view_equal::<i32>(lhs, rhs, lhs_start, rhs_start, len),
+        DataType::LargeListView(_) => list_view_equal::<i64>(lhs, rhs, lhs_start, rhs_start, len),
        DataType::FixedSizeList(_, _) => fixed_list_equal(lhs, rhs, lhs_start, rhs_start, len),


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

ListView/LargeListView equality dispatch is correct; verify inner fn slices sizes correctly.

list_view_equal should iterate the sliced sizes for the range, not the full slice. Please confirm it zips with lhs_range_sizes/rhs_range_sizes, not the entire lhs_sizes.

If found, change:

zip(lhs_sizes) -> zip(lhs_range_sizes.iter()) (and analogously in the branch with enumerate()).

🏁 Script executed:

#!/bin/bash # Locate potential misuse of full sizes slice inside list_view_equal rg -nP 'list_view_equal.*\{(?s).*zip\(\s*lhs_sizes\s*\)' arrow-data/src/equal/list_view.rs -C2 rg -nP 'zip\(\s*lhs_sizes\s*\)' arrow-data/src/equal/list_view.rs -n -C3

Length of output: 589

This script will help retrieve the complete function implementation.

🏁 Script executed:

#!/bin/bash # Extract the full list_view_equal function definition rg -n 'fn list_view_equal' arrow-data/src/equal/list_view.rs -A 60

Length of output: 73

🏁 Script executed:

#!/bin/bash # Extract the full list_view_equal function definition rg -n 'fn list_view_equal' arrow-data/src/equal/list_view.rs -A 60

Length of output: 2225

🏁 Script executed:

#!/bin/bash # Extract the rest of the function to see the null handling branch rg -n 'fn list_view_equal' arrow-data/src/equal/list_view.rs -A 100 | tail -50

Length of output: 2071

Fix two instances of incorrect buffer slicing in list_view_equal.

The function correctly creates lhs_range_sizes and rhs_range_sizes slices at lines 49/85, but then incorrectly zips with the full lhs_sizes buffer at lines 71 and 107. This causes the loop to iterate over wrong indices when lhs_start > 0.

Required changes:

Line 71: .zip(lhs_sizes) → .zip(lhs_range_sizes.iter())

Line 107: .zip(lhs_sizes) → .zip(lhs_range_sizes.iter())

🤖 Prompt for AI Agents

In arrow-data/src/equal/mod.rs around lines 49–111, list_view_equal creates lhs_range_sizes and rhs_range_sizes slices but later zips with the full lhs_sizes buffer causing incorrect iteration when lhs_start > 0; change the two zips at line 71 and line 107 to zip with lhs_range_sizes.iter() instead of lhs_sizes so the loop iterates over the sliced ranges (i.e., replace .zip(lhs_sizes) with .zip(lhs_range_sizes.iter()) at both places).

coderabbitai · 2025-10-24T13:16:10Z

arrow-select/src/concat.rs

+    let mut offsets = MutableBuffer::with_capacity(lists.iter().map(|l| l.offsets().len()).sum());
+    let mut global_offset = OffsetSize::zero();
+    for l in lists.iter() {
+        for &offset in l.offsets() {
+            offsets.push(offset + global_offset);
+        }
+
+        // advance the offsets
+        global_offset += OffsetSize::from_usize(l.values().len()).unwrap();
+    }


⚠️ Potential issue | 🟠 Major

Guard against offset overflow when adjusting by global_offset.

For i32 offsets, adding global_offset can overflow. Use checked_add and return OffsetOverflowError.

Apply:

- for l in lists.iter() { - for &offset in l.offsets() { - offsets.push(offset + global_offset); - } - - // advance the offsets - global_offset += OffsetSize::from_usize(l.values().len()).unwrap(); - } + for l in lists.iter() { + for &offset in l.offsets() { + let adj = offset + .checked_add(&global_offset) + .ok_or_else(|| ArrowError::OffsetOverflowError( + (offset.as_usize() as usize) + ))?; + offsets.push(adj); + } + // advance the offsets + let incr = OffsetSize::from_usize(l.values().len()) + .ok_or_else(|| ArrowError::OffsetOverflowError(l.values().len()))?; + global_offset = global_offset + .checked_add(&incr) + .ok_or_else(|| ArrowError::OffsetOverflowError(l.values().len()))?; + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In arrow-select/src/concat.rs around lines 244 to 253, the code adds each list offset to a running global_offset using plain addition which can overflow for i32 offsets; replace the direct addition with a checked_add on the OffsetSize (or its underlying integer) and if None return/propagate an OffsetOverflowError; likewise when advancing global_offset, use conversions that check for overflow (e.g., OffsetSize::from_usize(...)? or checked_add) so any overflow returns OffsetOverflowError instead of wrapping. Ensure the function signature propagates the error (or maps it into the crate's OffsetOverflowError) and update callers if necessary.

a10y added 12 commits October 17, 2025 13:35

feat: improved ListView support

95be754

Added support for ListView and LargeListView for the following operations: * `arrow_select::concat` * `arrow_select::filter_array` Signed-off-by: Andrew Duffy <[email protected]>

more fixes, add test

a3b821b

Signed-off-by: Andrew Duffy <[email protected]>

test for concat

eefd289

Signed-off-by: Andrew Duffy <[email protected]>

fix + tests for take list_view

d7e7846

Signed-off-by: Andrew Duffy <[email protected]>

remove unused test macro

b751907

Signed-off-by: Andrew Duffy <[email protected]>

fix license headers

124b437

Signed-off-by: Andrew Duffy <[email protected]>

remove old comment

278f297

Signed-off-by: Andrew Duffy <[email protected]>

address comments

c22b4a7

Signed-off-by: Andrew Duffy <[email protected]>

more tests, more clippy

722f34e

Signed-off-by: Andrew Duffy <[email protected]>

reduce

f3f8064

Signed-off-by: Andrew Duffy <[email protected]>

Add test for equality kernel

d0da6c5

Signed-off-by: Andrew Duffy <[email protected]>

github-actions bot added the arrow label Oct 24, 2025

augmentcode bot reviewed Oct 24, 2025

View reviewed changes

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

Conversation

martin-augment commented Oct 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Oct 24, 2025

Code Review: ListView/LargeListView Operations Support

Critical Issue Found

Code Quality

Performance & Security

Test Coverage

Overall Assessment

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martin-augment commented Oct 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 24, 2025 •

edited

Loading