Human written by @kylebarron
Background
Zarr is the pre-eminent data format for storing N-dimensional data. Zarr-Python is the primary Python library for reading/writing this data; Zarrs is the primary Rust library for reading/writing Zarr, and is potentially much faster than Zarr-Python.
@d-v-b recently prototyped vibe-coded Zarrs bindings for Zarr-Python in zarr-developers/zarr-python#4064. After some discussion, I decided to start prototyping a standalone Zarrs binding for Python, with the goal of providing another option for zarr-developers/zarr-python#4064. That is, Zarrista should be a low-level binding which Zarr-Python could potentially wrap in its higher level APIs in the future.
@d-v-b found that zarr-developers/zarr-python#4064 had vastly improved performance:
I'm seeing ~15x throughput improvement, looks good.
This gives strong motivation for the potential performance improvements of a Python Zarr library built on Zarrs.
Goals
Zarrista should be both directly usable by intermediate-to-advanced users, but should also be built so that Zarr-Python could theoretically build on top of it in the future.
Zarrista should not add logic beyond what already exists in Zarrs. Similar to Obstore, the scope should be limited to only what is already implemented upstream. This keeps maintainability high.
Zarrista should expose as many APIs as possible from Zarrs. Array and Group will be medium-high level APIs, but ideally all lower level APIs (if stable) should also be exposed, so that downstream libraries can choose the most performance route for them.
- Create a minimal but complete Python binding of Zarrs
- Various store support
- Zero copy data exchange between Rust and Python
- Primitive, fixed width types:
- Variable width types:
- Masked types:
Partner and Stakeholders
Partner with @d-v-b as needed for general project direction, and to ensure that public-facing API is usable
Keeping the scope limited to what is already implemented upstream keeps this project focused to what my (@kylebarron's) strengths are. Given my experience from obstore, async-tiff, and similar projects, I'm really good at creating high-performance, Pythonic APIs from Rust libraries. I have considerably less experience with Zarr itself. This means that I'm ill-equipped to design a higher-level Zarr API myself; that task should be left to @d-v-b and others in higher-level wrapper libraries (Zarr-Python or other).
Milestones
cc @developmentseed/cng-island
Human written by @kylebarron
Background
Zarr is the pre-eminent data format for storing N-dimensional data. Zarr-Python is the primary Python library for reading/writing this data; Zarrs is the primary Rust library for reading/writing Zarr, and is potentially much faster than Zarr-Python.
@d-v-b recently prototyped vibe-coded Zarrs bindings for Zarr-Python in zarr-developers/zarr-python#4064. After some discussion, I decided to start prototyping a standalone Zarrs binding for Python, with the goal of providing another option for zarr-developers/zarr-python#4064. That is, Zarrista should be a low-level binding which Zarr-Python could potentially wrap in its higher level APIs in the future.
@d-v-b found that zarr-developers/zarr-python#4064 had vastly improved performance:
This gives strong motivation for the potential performance improvements of a Python Zarr library built on Zarrs.
Goals
Zarrista should be both directly usable by intermediate-to-advanced users, but should also be built so that Zarr-Python could theoretically build on top of it in the future.
Zarrista should not add logic beyond what already exists in Zarrs. Similar to Obstore, the scope should be limited to only what is already implemented upstream. This keeps maintainability high.
Zarrista should expose as many APIs as possible from Zarrs.
ArrayandGroupwill be medium-high level APIs, but ideally all lower level APIs (if stable) should also be exposed, so that downstream libraries can choose the most performance route for them.Partner and Stakeholders
Partner with @d-v-b as needed for general project direction, and to ensure that public-facing API is usable
Keeping the scope limited to what is already implemented upstream keeps this project focused to what my (@kylebarron's) strengths are. Given my experience from obstore, async-tiff, and similar projects, I'm really good at creating high-performance, Pythonic APIs from Rust libraries. I have considerably less experience with Zarr itself. This means that I'm ill-equipped to design a higher-level Zarr API myself; that task should be left to @d-v-b and others in higher-level wrapper libraries (Zarr-Python or other).
Milestones
cc @developmentseed/cng-island