Skip to content

BUG: Add min/max methods to ArrowExtensionArray GH#61311 #61924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

skonda29
Copy link

@skonda29 skonda29 commented Jul 22, 2025

The core problem is that when using .iloc with PyArrow-backed DataFrames, pandas' indexing validation calls min() and max() methods on the ArrowExtensionArray for bounds checking, but these methods were not implemented, resulting in AttributeError: 'ArrowExtensionArray' object has no attribute 'max'. This breaks basic indexing functionality that works with regular pandas DataFrames, creating an inconsistency in the PyArrow backend experience.

Proposed Solution -
My proposed solution addresses the issue by modifying _validate_key in pandas/core/indexing.py to detect ExtensionArrays and convert them to numpy arrays using to_numpy() or np.asarray(). Included a test case in the file pandas/tests/indexing/test_iloc.py that reproduces the issue to verify the implementation.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, but the proper fix is for _validate_key to handle key: ExtensionArray correctly

@skonda29
Copy link
Author

@mroeschke Thank you for your suggestion. I will rework on this PR to implement the _validate_key fix instead.

@simonjayhawkins simonjayhawkins added Bug Indexing Related to indexing on series/frames, not to indexes themselves Arrow pyarrow functionality labels Jul 23, 2025
@skonda29 skonda29 force-pushed the skonda29-issue-61311 branch from 117175b to f87de21 Compare July 30, 2025 15:19
@skonda29
Copy link
Author

@mroeschke Would you mind taking a look at this PR when you get a chance? I've added a conversion to NumPy in _validate_key, and included a test case.

Feedback is appreciated!

Comment on lines 1613 to 1617
# convert to numpy array for min/max with ExtensionArrays
if hasattr(arr, "to_numpy"):
np_arr = arr.to_numpy()
else:
np_arr = np.asarray(arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if arr is an ExtensionArray using if isinstance(arr.dtype, ExtensionDtype) and use arr._reduce("max") instead


df_arrow = df.convert_dtypes(dtype_backend="pyarrow")
result = df_arrow.iloc[:, df_arrow["c"]]
expected = df_arrow.iloc[:, [0, 2]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Construct this using DataFrame(...) instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: 'ArrowExtensionArray' object has no attribute 'max' when passing pyarrow-backed series to .iloc
3 participants