-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
groupby_bins fails on time series data #10217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Okay, I did a bit more checking. So, this probably isn't technically a bug. Time data is not numeric data. ( I still hold that this is surprising and inconsistent behavior. I've done some more experiments. If you apply the groupby operation directly to the time variable, it works. In other words, continuing from the example in my original post print(ds.time.groupby_bins('trial',5).mean()) prints what I'd expect:
It seems inconsistent that applying |
Closes pydata#5897 Closes pydata#6995 Closes pydata#10217
I worked on this a while ago and never opened a PR. See #10227. Are you able to contribute some extra tests? For example, we'd need one for Dataset, and DataArray at least. |
What happened?
I'm not sure if this is a bug, or just surprising behavior.
When I have a dataset with timeseries variables, and I do a
groupby_bins
operation followed by amean()
operation, the timeseries data is silently dropped from the dataset, instead of being aggregated.What did you expect to happen?
I expect the groupby_bins operation to be applied to time_series data when it is applicable to time series data. For example, in the example code below, the
mean()
operation should have return the average time in each bin.Some aggregation operations might not be well defined for time (arguably sum(), for example). In such cases I'd expect it should return nans or raise an error.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:16:10) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2025.3.1
pandas: 2.2.3
numpy: 2.1.3
scipy: 1.15.2
netCDF4: 1.7.2
pydap: 3.5.4
h5netcdf: 1.6.1
h5py: 3.13.0
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: 3.11.0
bottleneck: 1.4.2
dask: 2025.3.0
distributed: 2025.3.0
matplotlib: 3.10.1
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: 0.9.0
fsspec: 2025.3.2
cupy: None
pint: 0.24.4
sparse: 0.16.0
flox: None
numpy_groupies: None
setuptools: 75.8.0
pip: 25.0
conda: None
pytest: None
mypy: None
IPython: 8.32.0
sphinx: None
The text was updated successfully, but these errors were encountered: