Skip to content

Provide better support for the rechunking use case #101

@forman

Description

@forman

Is your feature request related to a problem? Please describe.

zappend is often used to rechunk a potentially large data cube. The use case here is, that the slices originate from an existing Zarr cube. In other use cases the slices are individual datasets, e.g., from NetCDF files or GeoTIFFs.

For new users it is not obvious how to configure zappend for the rechunking use case.

Describe the solution you'd like

  • Describe rechunking in the user guide and in the "how do I" section.
  • Provide helper generator class or function that splits a source dataset into slices so that its result can be passed as 1st argument to zappend.

Helper class example:

class DatasetSlices:
    def __init__(self, ds: xr.Dataset, time_index: int = 0):
        self.ds = ds
        self.time_size = ds.time.size
        self.time_index = time_index
        
    def __next__(self):
        if self.time_index >= self.time_size:
            raise StopIteration()
        ds = self.ds
        time_index = self.time_index
        slice_ds = ds.isel(time=slice(time_index, time_index+1))
        self.time_index += 1
        return slice_ds
        
    def __iter__(self):
        return self

Example usage:

source_ds = xr.open_zarr(source_path)
zappend(DatasetSlices(source_ds), target_dir=target_path, ...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions