Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): extension array indexers #9671

Open
wants to merge 189 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link
Contributor

Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....

kmuehlbauer and others added 30 commits October 18, 2024 07:31
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…t resolution, fix code and tests to allow this
… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)
…rray` series creating an extension array when `.array` is accessed
@kmuehlbauer
Copy link
Contributor

@ilan-gold Can you rebase your changes on latest main? PR #9618 just got merged.

@ilan-gold
Copy link
Contributor Author

@ilan-gold Can you rebase your changes on latest main? PR #9618 just got merged.

I think I did. The tests look good but I'll have to re-check tomorrow - it was just mypy failing before but now I picked up an actual test failure

@kmuehlbauer
Copy link
Contributor

Yes, looks clean. I was irritated by the amount of commits. But this will be squashed anyway, or do I miss something.

The error

ERROR xarray/tests/test_distributed.py::test_dask_distributed_zarr_integration_test[True-True] - Failed: 9 thread(s) were leaked from test

seems unrelated. But it's also on main now. Not sure how to debug this...

@ilan-gold
Copy link
Contributor Author

Great @kmuehlbauer - I want the maintainers to look at the MyPy. I could in theory fix it, but I would basically be guessing at what their wishes are for the classes' return types.

) -> np.ndarray:
if dtype is None:
dtype = self.dtype
if pd.api.types.is_extension_array_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? Why would someone call np.array with an extension dtype, and then expect it to get translated to a numpy dtype?

Copy link
Contributor Author

@ilan-gold ilan-gold Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for internal usage, otherwise I wouldn't have added it. I can delete the line and then see what happens, and then comment.

Copy link
Contributor Author

@ilan-gold ilan-gold Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian This class is basically an internal adapter so anything that asks for its data in numpy form will call this. Things like repr, subtraction, and calling .values on an xarray object are a few examples

) -> np.ndarray:
if dtype is None:
dtype = self.dtype
if pd.api.types.is_extension_array_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. Why is this needed?

@@ -6875,7 +6875,7 @@ def groupby(
[[nan, nan, nan],
[ 3., 4., 5.]]])
Coordinates:
* x_bins (x_bins) object 16B (5, 15] (15, 25]
* x_bins (x_bins) interval[int64, right] 16B (5, 15] (15, 25]
Copy link
Contributor

@dcherian dcherian Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is amazing, it enables IntervalIndex indexing now.

cc @benbovy

@dcherian
Copy link
Contributor

@Illviljan or @headtr1ck can you take a look at the typing failure please

@@ -17,6 +17,7 @@
)

import numpy as np
import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NamedArray is supposed to not have a dependency on pandas.

Comment on lines +838 to +840
if pd.api.types.is_extension_array_dtype(data_old.dtype):
# One of PandasExtensionArray or PandasIndexingAdapter?
ndata = data_old.array.to_numpy()
Copy link
Contributor

@Illviljan Illviljan Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.api.types.is_extension_array_dtype(data_old.dtype) does not imply data_old is an extension array.
You probably should use some kind of isinstance-check to be able to use .array.

I haven't used extension arrays myself that much, why can't a simple np.asarray(data_old) be used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants