Skip to content

8822: Add support for using ListView arrays and types through FFI#19

Open
martin-augment wants to merge 4 commits intomainfrom
pr-8822-2025-11-13-14-30-14
Open

8822: Add support for using ListView arrays and types through FFI#19
martin-augment wants to merge 4 commits intomainfrom
pr-8822-2025-11-13-14-30-14

Conversation

@martin-augment
Copy link
Owner

8822: To review by AI

@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Walkthrough

This pull request adds support for ListView and LargeListView data type variants across the Apache Arrow Rust codebase. Changes include introducing new enum variants in the schema crate with FFI serialization and deserialization support for formats "+vl" and "+vL"; extending buffer handling and null array construction in array and data modules; and adding corresponding tests for round-trip conversions. PyArrow integration testing was updated to support list view arrays in PyArrow 16 and later.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-8822-2025-11-13-14-30-14

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6443d09 and c0e5df9.

📒 Files selected for processing (4)
  • arrow-array/src/ffi.rs (9 hunks)
  • arrow-data/src/data.rs (2 hunks)
  • arrow-pyarrow-integration-testing/tests/test_sql.py (3 hunks)
  • arrow-schema/src/ffi.rs (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
arrow-data/src/data.rs (2)
arrow-array/src/record_batch.rs (1)
  • new_empty (286-298)
arrow-array/src/array/list_view_array.rs (1)
  • data_type (387-389)
arrow-schema/src/ffi.rs (1)
arrow-schema/src/field.rs (1)
  • new (192-202)
arrow-pyarrow-integration-testing/tests/test_sql.py (2)
arrow-pyarrow-integration-testing/src/lib.rs (2)
  • array (50-52)
  • round_trip_array (127-129)
arrow-data/src/data.rs (1)
  • validate (757-895)
arrow-array/src/ffi.rs (5)
arrow-array/src/array/mod.rs (5)
  • data_type (127-127)
  • data_type (358-360)
  • data_type (437-439)
  • make_array (744-842)
  • value (581-581)
arrow-array/src/array/list_view_array.rs (2)
  • Buffer (1097-1097)
  • value_offsets (314-316)
arrow-buffer/src/buffer/immutable.rs (10)
  • from_slice_ref (147-153)
  • from (458-460)
  • from (464-466)
  • from (470-472)
  • from (476-478)
  • from (482-484)
  • from (490-498)
  • from (503-506)
  • from (535-537)
  • from (541-543)
arrow-buffer/src/native.rs (4)
  • from_usize (62-62)
  • from_usize (184-186)
  • from_usize (215-217)
  • from_usize (242-244)
arrow-data/src/ffi.rs (1)
  • new (121-210)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: claude-review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Nov 13, 2025

Code Review: Add ListView/LargeListView FFI Support

Thank you for implementing FFI support for ListView arrays! This is a solid implementation that properly extends the FFI interface to support list-view layouts. Below is my comprehensive review:


Code Quality & Best Practices

Strengths:

  1. Consistent Pattern Following: The implementation correctly follows the existing patterns for List/LargeList FFI support, which makes the code maintainable and predictable.

  2. Documentation Improvement: Good catch updating the comment style from // to /// for the bit_width function (arrow-array/src/ffi.rs:143-144).

  3. Comprehensive Coverage: All necessary touch points are covered:

    • Schema format strings ("+vl"/"+vL") in arrow-schema/src/ffi.rs
    • Buffer layout specifications in arrow-array/src/ffi.rs
    • Bit width calculations
    • Child array consumption
    • Null data creation in arrow-data/src/data.rs
  4. Test Coverage: Good test coverage with both ListView and LargeListView variants, including null handling.


🐛 Potential Issues

Critical: Incorrect variadic Flag Change (arrow-data/src/data.rs:1786)

- variadic: true,
+ variadic: false,

Concern: This changes the variadic flag for a data type layout, but the PR description and context don't mention this change. This appears to be modifying an existing layout definition that may not be related to ListView.

Questions:

  • Which data type layout is being modified here? (The diff context is unclear)
  • Is this change intentional and related to ListView support?
  • Could this break existing functionality?

Action Required: Please clarify this change. If it's unrelated to ListView, it should be in a separate PR or needs justification.


🔍 Areas for Consideration

1. Buffer Length Calculation (arrow-array/src/ffi.rs:480-487)

(DataType::ListView(_), 1)
| (DataType::ListView(_), 2)
| (DataType::LargeListView(_), 1)
| (DataType::LargeListView(_), 2) => {
    let bits = bit_width(data_type, i)?;
    debug_assert_eq!(bits % 8, 0);
    length * (bits / 8)
}

Correct: Uses length * (bits / 8) instead of (length + 1) * (bits / 8) like List arrays. This is correct because ListView needs exactly length entries for both offsets and sizes, unlike List which needs length + 1 offset entries.

2. Bit Width Specification (arrow-array/src/ffi.rs:183-186)

// List views have two i32 buffers, offsets and sizes
(DataType::ListView(_), 1) | (DataType::ListView(_), 2) => i32::BITS as _,
// Large list views have two i64 buffers, offsets and sizes
(DataType::LargeListView(_), 1) | (DataType::LargeListView(_), 2) => i64::BITS as _,

Correct: Properly handles both buffer 1 (offsets) and buffer 2 (sizes) with appropriate bit widths.

3. Null Data Creation (arrow-data/src/data.rs:623-632)

DataType::ListView(f) => (
    vec![zeroed(len * 4), zeroed(len * 4)],
    vec![ArrayData::new_empty(f.data_type())],
    true,
),

Correct: Uses new_empty for child array (not new_null), which is appropriate for ListView since null list elements don't require valid child data.

Note: This differs from FixedSizeList which uses new_null because it has fixed-size child arrays that must exist even for null parent elements.


Performance Considerations

No Performance Concerns:

  • The implementation correctly maintains ListView's zero-copy advantages through FFI
  • Buffer calculations are O(1)
  • No unnecessary allocations or copies
  • Follows efficient patterns from existing List FFI implementation

🔒 Security Concerns

No Security Issues Identified:

  • Proper bounds checking is inherited from the existing bit_width and buffer validation functions
  • The implementation relies on Arrow's existing FFI validation infrastructure
  • No unsafe code introduced beyond the existing FFI patterns

Recommendation: Ensure that the receiving side (when importing from FFI) properly validates:

  • offsets.len() == sizes.len()
  • offset[i] + size[i] <= child_array.len()
  • These validations should already exist in the ListView try_new constructor

🧪 Test Coverage

Current Coverage (Good):

  1. test_generic_list_view<i32>() - ListView round-trip
  2. test_generic_list_view<i64>() - LargeListView round-trip
  3. test_list_view() - ListView from builder
  4. test_list_view_with_nulls() - Null handling
  5. ✅ Python integration tests with PyArrow 16+ (test_sql.py)

Suggested Additional Tests:

  1. Edge Case: Empty ListView

    let list_view = ListViewArray::from(vec![Vec::<i32>::new()]);
    test_round_trip(&list_view.to_data())
  2. Edge Case: All Null ListView

    let list_view_data = ArrayData::builder(list_view_dt)
        .len(3)
        .add_buffer(Buffer::from(vec![0_i32; 3]))
        .add_buffer(Buffer::from(vec![0_i32; 3]))
        .add_child_data(empty_child)
        .nulls(Some(NullBuffer::from(vec![false, false, false])))
        .build()?;
  3. ListView with Out-of-Order Offsets: Test the key feature that distinguishes ListView from List - this would verify that the FFI handles out-of-order data correctly.


📝 Minor Style Notes

  1. Import Organization (arrow-array/src/ffi.rs:1398): There's a blank line added after use super::Result; - this is fine but check if it matches project style.

  2. PyArrow Version Check: Good addition of the PYARROW_PRE_16 check to properly gate ListView tests.


📋 Summary

Overall Assessment: ⭐⭐⭐⭐ (4/5)

This is a well-implemented PR that correctly adds FFI support for ListView arrays following established patterns. The main blocker is clarification on the variadic flag change.

Before Merge:

  • REQUIRED: Clarify/justify the variadic: false change in arrow-data/src/data.rs:1786
  • Consider adding edge case tests (empty arrays, all-null arrays, out-of-order offsets)
  • Verify that ListView validation constraints are properly enforced during FFI import

Approved Pending: Resolution of the variadic flag change question.


Great work on this implementation! The code is clean, follows existing patterns well, and includes good test coverage. 🎉

@martin-augment
Copy link
Owner Author

augment review

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

@martin-augment
Copy link
Owner Author

Critical: Incorrect variadic Flag Change (arrow-data/src/data.rs:1786)

- variadic: true,
+ variadic: false,

Concern: This changes the variadic flag for a data type layout, but the PR description and context don't mention this change. This appears to be modifying an existing layout definition that may not be related to ListView.

value:incorrect-but-reasonable; category:bug; feedback:The Claude AI reviewer is correct that this change is not related to the topic of this Pull Request. The PR author made the change because it is a bug according to the Arrow specification (https://arrow.apache.org/docs/format/Columnar.html#variadic-buffers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants