Confusing error when `use_cftime = True` and `chunks = 'auto'` in `xr.open_dataset()`

### What is your issue?

Opening a dataset with `use_cftime=True` turns the time dimension dtype from datetime64 to object. This means that using `chunks='auto'` will fail in dask, since dask can't estimate the size of variables with dtype object. 

However, the error is a bit confusing, since it's from the underlying dask call, and doesn't tell the user what caused it. 

```
import xarray as xr
# Generally succeeds
xr.open_dataset(fn,chunks='auto')

# Definitely fails
xr.open_dataset(fn,chunks='auto',use_cftime=True)
```
The error is: 

```
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[46], line 1
----> 1 xr.open_dataset(fn,use_cftime=True,chunks='auto')

File [~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/backends/api.py:617](http://localhost:8888/lab/tree/hle_iv/~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/backends/api.py#line=616), in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    610 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
    611 backend_ds = backend.open_dataset(
    612     filename_or_obj,
    613     drop_variables=drop_variables,
    614     **decoders,
    615     **kwargs,
    616 )
--> 617 ds = _dataset_from_backend_dataset(
    618     backend_ds,
    619     filename_or_obj,
    620     engine,
    621     chunks,
    622     cache,
    623     overwrite_encoded_chunks,
    624     inline_array,
    625     chunked_array_type,
    626     from_array_kwargs,
    627     drop_variables=drop_variables,
    628     **decoders,
    629     **kwargs,
    630 )
    631 return ds

File [~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/backends/api.py:393](http://localhost:8888/lab/tree/hle_iv/~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/backends/api.py#line=392), in _dataset_from_backend_dataset(backend_ds, filename_or_obj, engine, chunks, cache, overwrite_encoded_chunks, inline_array, chunked_array_type, from_array_kwargs, **extra_tokens)
    391     ds = backend_ds
    392 else:
--> 393     ds = _chunk_ds(
    394         backend_ds,
    395         filename_or_obj,
    396         engine,
    397         chunks,
    398         overwrite_encoded_chunks,
    399         inline_array,
    400         chunked_array_type,
    401         from_array_kwargs,
    402         **extra_tokens,
    403     )
    405 ds.set_close(backend_ds._close)
    407 # Ensure source filename always stored in dataset object

File [~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/backends/api.py:357](http://localhost:8888/lab/tree/hle_iv/~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/backends/api.py#line=356), in _chunk_ds(backend_ds, filename_or_obj, engine, chunks, overwrite_encoded_chunks, inline_array, chunked_array_type, from_array_kwargs, **extra_tokens)
    355 variables = {}
    356 for name, var in backend_ds.variables.items():
--> 357     var_chunks = _get_chunk(var, chunks, chunkmanager)
    358     variables[name] = _maybe_chunk(
    359         name,
    360         var,
   (...)
    367         from_array_kwargs=from_array_kwargs.copy(),
    368     )
    369 return backend_ds._replace(variables)

File [~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/core/dataset.py:255](http://localhost:8888/lab/tree/hle_iv/~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/core/dataset.py#line=254), in _get_chunk(var, chunks, chunkmanager)
    249     chunks = dict.fromkeys(dims, chunks)
    250 chunk_shape = tuple(
    251     chunks.get(dim, None) or preferred_chunk_sizes
    252     for dim, preferred_chunk_sizes in zip(dims, preferred_chunk_shape, strict=True)
    253 )
--> 255 chunk_shape = chunkmanager.normalize_chunks(
    256     chunk_shape, shape=shape, dtype=var.dtype, previous_chunks=preferred_chunk_shape
    257 )
    259 # Warn where requested chunks break preferred chunks, provided that the variable
    260 # contains data.
    261 if var.size:

File [~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py:58](http://localhost:8888/lab/tree/hle_iv/~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py#line=57), in DaskManager.normalize_chunks(self, chunks, shape, limit, dtype, previous_chunks)
     55 """Called by open_dataset"""
     56 from dask.array.core import normalize_chunks
---> 58 return normalize_chunks(
     59     chunks,
     60     shape=shape,
     61     limit=limit,
     62     dtype=dtype,
     63     previous_chunks=previous_chunks,
     64 )

File [~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/dask/array/core.py:3132](http://localhost:8888/lab/tree/hle_iv/~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/dask/array/core.py#line=3131), in normalize_chunks(chunks, shape, limit, dtype, previous_chunks)
   3129 chunks = tuple("auto" if isinstance(c, str) and c != "auto" else c for c in chunks)
   3131 if any(c == "auto" for c in chunks):
-> 3132     chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
   3134 if shape is not None:
   3135     chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape))

File [~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/dask/array/core.py:3237](http://localhost:8888/lab/tree/hle_iv/~/opt/anaconda3/envs/hle_iv/lib/python3.12/site-packages/dask/array/core.py#line=3236), in auto_chunks(chunks, shape, limit, dtype, previous_chunks)
   3234     raise TypeError("dtype must be known for auto-chunking")
   3236 if dtype.hasobject:
-> 3237     raise NotImplementedError(
   3238         "Can not use auto rechunking with object dtype. "
   3239         "We are unable to estimate the size in bytes of object data"
   3240     )
   3242 for x in tuple(chunks) + tuple(shape):
   3243     if (
   3244         isinstance(x, Number)
   3245         and np.isnan(x)
   3246         or isinstance(x, tuple)
   3247         and np.isnan(x).any()
   3248     ):

NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data
```

Suggestion for now: add an Exception for when `chunks='auto'` and `use_cftime=True` are called at the same time. I think this should be implementable in `backends.open_dataset()` (rather than in any specific engine's open_dataset) since it's likely common to any opening procedure, regardless of backend? 
Something like
```
if (chunk == 'auto') and (use_cftime): 
   raise NotImplementedError('`use_cftime=True` changes the dtype of time variables to object, however, dask cannot yet chunk variables of object dtype. Manually specifying chunks (instead of using `chunks='auto'` will not throw this exception.')
```

Suggestion for later: If it's possible to estimate the size of the array with `datetime` objects in the time coordinate, it should be possible to estimate it with `cftime` objects as well (since whether or not the _coordinate_ itself is stored in one or the other is unlikely to make a difference in how to chunk the other variables). Is there maybe a way to get [`conventions.decode_cf_variable()`](https://github.com/pydata/xarray/blob/0b97969eba1a7dbc3017848359397e14036cdf53/xarray/conventions.py#L207) to also return the original datetime object to present for chunking in it place of the converted cftime object? Or just for chunking to just apply the same chunking to a 1D coordinate that it would to that coordinate's dimension in the non-object-dtype arrays that may be present in the same dataset?  (I guess this theoretically could be unstable if the object coordinate for some reason takes up a lot more space than it would if it were numeric, etc.). 

(I'm working on putting together a PR for at least the Exception - please let me know if there's anything I should keep in mind, especially with where the exception would be most appropriate to stick, if this is a bad idea, etc.) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Confusing error when `use_cftime = True` and `chunks = 'auto'` in `xr.open_dataset()` #9834

What is your issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Confusing error when use_cftime = True and chunks = 'auto' in xr.open_dataset() #9834

Description

What is your issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Confusing error when `use_cftime = True` and `chunks = 'auto'` in `xr.open_dataset()` #9834