Skip to content

Conversation

@martin-augment
Copy link
Owner

19948: To review by AI

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fixes a bug in the array_sort function where the nullability of the inner elements of list arrays was not correctly preserved. The changes ensure that the sorted list array retains the exact nullability characteristics of its input, preventing unintended schema modifications and improving data integrity.

Highlights

  • Inner List Nullability Preservation: The array_sort function now correctly preserves the nullability of the inner elements within list arrays, ensuring that the output schema accurately reflects the input's nullability.
  • Simplified Return Type Logic: The return_type method for array_sort has been simplified to directly clone the input argument's data type, which inherently maintains the inner list's nullability without explicit re-declaration.
  • Null Buffer Handling Refinement: The internal array_sort_generic function has been refactored to reuse the original list array's null buffer, rather than constructing a new one, thereby ensuring that existing nullability is carried over.
  • New Test Cases: New SQL logic tests have been added to verify that array_sort correctly handles and maintains the non-null property of inner list elements across List, LargeList, and FixedSizeList types.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 25, 2026

Walkthrough

This pull request modifies the array sorting implementation and adds comprehensive test coverage. The primary change refactors the sort.rs file to simplify the return type handling, remove validity buffer tracking, and adjust parameter passing for List and LargeList types to use FieldRef by value. Additionally, new tests are introduced to verify array sorting behavior across various list element types, inner nullability preservation, complex types, and error handling scenarios.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 25, 2026

Pull Request Review: Fix array_sort Inner List Nullability

Summary

This PR fixes a regression introduced in apache#17657 where array_sort was incorrectly hardcoding inner field nullability to true in the return_type function while preserving it in the execution path. The fix correctly maintains the input field's nullability specification.

Code Quality & Best Practices ✅

Strengths:

  1. Simplified return_type implementation (sort.rs:134-136): The change from a complex match statement to simply cloning the input type is elegant and correct. This properly preserves all aspects of the input type including nullability.

  2. Cleaner null handling (sort.rs:256): Replacing NullBufferBuilder with direct cloning of the input's null buffer (list_array.nulls().cloned()) is more efficient and ensures correctness by preserving the exact null pattern from the input.

  3. Removed unused imports: Properly cleaned up NullBufferBuilder and plan_err imports that are no longer needed.

  4. Better field ownership semantics (sort.rs:196, 200): Changed from passing &FieldRef to FieldRef which is more idiomatic since FieldRef is already an Arc - cloning the Arc is cheap and avoids unnecessary dereferencing.

Test Coverage ✅

Excellent test additions:

  1. Tests cover the core bug fix - maintaining non-null inner types (test_files/array.slt:2580-2591)
  2. Tests verify behavior across different list types: List, LargeList, and FixedSizeList (test_files/array.slt:2593-2603)
  3. Tests check both NULL and non-NULL outer values with non-nullable inner types
  4. Uses arrow_typeof to verify type preservation - critical for catching this class of bug

Potential Issues & Concerns

Minor Concern: Empty Array Handling

When elements.is_empty() (sort.rs:249-250), the code creates a new null array. This is correct behavior, but worth noting that this occurs when all rows in the batch are NULL. The function preserves the field metadata correctly in this case.

Logic Verification ✅

The refactored null handling logic is correct:

  • Before: Manually built a null buffer by appending null/non-null for each row
  • After: Directly clones the input's null buffer

This is semantically identical since the function doesn't change which rows are null - it only sorts the contents of non-null rows. The new approach is simpler and more efficient.

Performance Considerations ✅

Improvements:

  1. Eliminated redundant null buffer construction: No longer builds a new NullBufferBuilder and populates it row-by-row. Instead, directly clones the existing buffer.
  2. Reduced memory allocations: One less vector allocation (valid builder is removed)
  3. Better Arc handling: Using Arc::clone(field) is explicit and idiomatic

Security Concerns ✅

No security concerns identified. The changes:

  • Don't introduce new unsafe code
  • Don't change error handling in a way that could leak information
  • Properly handle null values without risk of null pointer dereference

Correctness Verification

The fix addresses the core issue:

  • Problem: return_type was hardcoding .new_list(field.data_type().clone(), true) - the true forces nullability regardless of input
  • Solution: return_type now returns arg_types[0].clone() which preserves the original type exactly
  • Runtime: Already preserved field metadata via Arc::clone(field), so execution was correct
  • Result: Type system and runtime are now consistent

Recommendations

Optional Cleanup Opportunity

At sort.rs:216-218, when a row is null, the code appends a zero-length entry. Consider adding a comment explaining this behavior:

if list_array.is_null(i) {
    // Null rows contribute zero-length entries to maintain offset alignment
    array_lengths.push(0);

This is already correct, but the comment would help future maintainers understand why we track lengths for null rows.

Final Verdict: LGTM

This is a high-quality fix that:

  • ✅ Correctly addresses the regression
  • ✅ Simplifies the code
  • ✅ Improves performance
  • ✅ Includes comprehensive tests
  • ✅ Follows Rust best practices
  • ✅ Has no security concerns

The changes are well-reasoned and the implementation is clean. Great work fixing this type preservation bug!

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the array_sort function where the nullability of inner list elements was not preserved. The return_type implementation was incorrectly forcing elements to become nullable.

The fix correctly changes return_type to return the input type, preserving the original nullability. The implementation of array_sort has also been refactored to be cleaner and more robust. It now reuses the null buffer from the input array instead of manually reconstructing it, which is a significant improvement.

The added tests in array.slt effectively verify the fix for List, LargeList, and FixedSizeList.

Overall, the changes are excellent, addressing the bug correctly and improving code quality. I have no further suggestions.

@augmentcode
Copy link

augmentcode bot commented Jan 25, 2026

🤖 Augment PR Summary

Summary: Fixes array_sort to preserve the input list’s element nullability (e.g. List(non-null Int32)) rather than forcing list items to be nullable.

Changes:

  • ArraySort::return_type now returns the (coerced) first argument type unchanged.
  • List output construction now reuses the input list’s null buffer to keep outer nulls consistent.
  • Added sqllogictest coverage for List/LargeList/FixedSizeList inputs with non-null elements.

Technical Notes: Fixed-size lists are still coerced to List via ListCoercion::FixedSizedListToList; sorting behavior is unchanged aside from type/nullability preservation.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants