Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug with setitem with oindex and sharding #2834

Open
dcherian opened this issue Feb 14, 2025 · 2 comments
Open

bug with setitem with oindex and sharding #2834

dcherian opened this issue Feb 14, 2025 · 2 comments
Labels
bug Potential issues with the zarr-python library

Comments

@dcherian
Copy link
Contributor

Zarr version

v3.0.2

Numcodecs version

?

Python Version

3.12

Operating System

any

Installation

any

Description

Discovered in #2825

Steps to reproduce

import zarr
import numpy as np

store = zarr.storage.MemoryStore()
group = zarr.group(store)
array = group.create_array(
    name="zoo",
    shape=(1,2,1),
    chunks=(1,2,1),
    shards=(1,2,1),
    dtype=np.int32,
)
zindexer = (np.array([0]), np.array([0, 0]), np.array([0]))
new_data = np.full(array.oindex[zindexer].shape, fill_value=1)
array.oindex[zindexer] = new_data

raises the error:

ValueError: shape mismatch: value array of shape (1,2,1) could not be broadcast to indexing result of shape (2,)

It succeeds with shards=None.

Additional output

No response

@dcherian dcherian added the bug Potential issues with the zarr-python library label Feb 14, 2025
@brokkoli71
Copy link
Member

I think the problem is that the information that the selection is an OrthogonalSelection gets lost on the way.

In ShardingCodec._encode_partial_single a new indexer is created to index the inner chunks:

indexer = list(
    get_indexer(
        selection, shape=shard_shape, chunk_grid=RegularChunkGrid(chunk_shape=chunk_shape)
    )
)

There, get_indexer guesses what indexing should be done based on the selection tuple, and in this case guesses CoordinateIndexer instead of OrthogonalIndexer

@moradology
Copy link
Contributor

moradology commented Mar 4, 2025

Looks like there are some problematic ambiguities that will be a plague get_indexer without a bit more information - don't think there's enough information, in principle, to decide what to do here.

Pure Fancy Indexing:
Selection using only scalars and integer/boolean array–likes (no slices).
Tests in is_pure_fancy_indexing:

  1. True if selection is a boolean array
  2. True if ndim == 1 and selection is an integer list, integer array, or boolean list
  3. True if selection is a tuple of length == ndim, contains no slices or Ellipsis, and at least one element is an integer list or array
  4. else, False

Orthogonal Indexing:
A subset of pure fancy indexing where each axis is indexed independently (Cartesian product), typically expecting 1D arrays.
Tests in is_pure_orthogonal_indexing:

  1. False if ndim==0
  2. True if len(selection) == ndim and every element is an int/bool iterable
  3. True if len(selection) <= ndim, at most one element is an int/bool iterable, and all other elements are int or slice
  4. else, False

Example 1:

Selection: (np.array([0, 1]), np.array([2, 3])) for a 2D array.

Ambiguity:
Coordinate interpretation: Zips the arrays to select coordinates (0,2) and (1,3).
Orthogonal interpretation: Forms a Cartesian product yielding (0,2), (0,3), (1,2), (1,3).

ipdb> is_pure_fancy_indexing((np.array([0, 1]), np.array([2, 3])), 2)
True
ipdb> is_pure_orthogonal_indexing((np.array([0, 1]), np.array([2, 3])), 2)
True

Example 2:

Selection: (np.array([0]), 1) for a 2D array.

Ambiguity:
One axis uses an iterable while the other is a scalar.
Could be interpreted as a coordinate selection by “zipping” (i.e., pairing the single value with 1) or forced into orthogonal mode, treating the iterable independently.

ipdb> is_pure_fancy_indexing((np.array([0]), 1), 2)
True
ipdb> is_pure_orthogonal_indexing((np.array([0]), 1), 2)
True

Example 3:

Selection: (np.array([0]), np.array([[1]])) where the second index isn’t 1D.

Ambiguity:
The extra dimension prevents clear classification.
Once flattened (i.e., via .ravel()), it becomes a 1D array and may be treated as orthogonal. Without normalization, it’s unclear whether coordinate or orthogonal behavior is intended.

ipdb> is_pure_fancy_indexing((np.array([0]), np.array([[1]])), 2)
True
ipdb> is_pure_orthogonal_indexing((np.array([0]), np.array([[1]])), 2)
True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants