-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
BUG: Add min/max methods to ArrowExtensionArray GH#61311 #61924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pull request, but the proper fix is for _validate_key
to handle key: ExtensionArray
correctly
@mroeschke Thank you for your suggestion. I will rework on this PR to implement the _validate_key fix instead. |
117175b
to
f87de21
Compare
@mroeschke Would you mind taking a look at this PR when you get a chance? I've added a conversion to NumPy in _validate_key, and included a test case. Feedback is appreciated! |
pandas/core/indexing.py
Outdated
# convert to numpy array for min/max with ExtensionArrays | ||
if hasattr(arr, "to_numpy"): | ||
np_arr = arr.to_numpy() | ||
else: | ||
np_arr = np.asarray(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if arr
is an ExtensionArray
using if isinstance(arr.dtype, ExtensionDtype)
and use arr._reduce("max")
instead
pandas/tests/indexing/test_iloc.py
Outdated
|
||
df_arrow = df.convert_dtypes(dtype_backend="pyarrow") | ||
result = df_arrow.iloc[:, df_arrow["c"]] | ||
expected = df_arrow.iloc[:, [0, 2]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Construct this using DataFrame(...)
instead
'ArrowExtensionArray' object has no attribute 'max'
when passing pyarrow-backed series to.iloc
#61311doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.The core problem is that when using .iloc with PyArrow-backed DataFrames, pandas' indexing validation calls min() and max() methods on the ArrowExtensionArray for bounds checking, but these methods were not implemented, resulting in AttributeError: 'ArrowExtensionArray' object has no attribute 'max'. This breaks basic indexing functionality that works with regular pandas DataFrames, creating an inconsistency in the PyArrow backend experience.
Proposed Solution -
My proposed solution addresses the issue by modifying _validate_key in pandas/core/indexing.py to detect ExtensionArrays and convert them to numpy arrays using to_numpy() or np.asarray(). Included a test case in the file pandas/tests/indexing/test_iloc.py that reproduces the issue to verify the implementation.