Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary: Reading Zarr to a cupy-backed Dataset #16

Open
TomAugspurger opened this issue Feb 27, 2025 · 0 comments
Open

Summary: Reading Zarr to a cupy-backed Dataset #16

TomAugspurger opened this issue Feb 27, 2025 · 0 comments

Comments

@TomAugspurger
Copy link
Member

TomAugspurger commented Feb 27, 2025

cupy-xarray had an in-progress PR to enable going from Zarr to a CuPy backed xarray Dataset. That PR was somewhat complicated to implement and restricted users to the kvikio GDSStore. GDSStore is great, especially if you have GPU Direct Storage enabled on your system, but it's limited to local file systems. You wouldn't be able to use other Zarr storage providers, like obstore, fsspec, or icechunk.

Since that PR was started, zarr-python 3.x was released with native support for reading data to host memory. With two small changes to xarray (lazy indexing for cupy arrays, read coordinates to host memory) we're able to support reading from Zarr to a CuPy-backed xarray Dataset, in a way that should feel very natural to users:

>>> import xarray as xr, zarr
>>> zarr.config.enable_gpu()
>>> ds = xr.open_dataset("dataset.zarr", engine="zarr")
>>> print(type(ds.air.data))
<class 'cupy.ndarray'>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant