Open
Description
I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (numpy/numpy#7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).
To help guide discussion, I wrote a document outlining ideas for array shape typing.
To summarize:
- We would like to be able to type-check both data types (e.g.,
float64
) and shapes (e.g., a 3x4 array) for multi-dimensional arrays. - There are many uses cases where support for checks using dimension identity would be valuable, e.g., to indicate that a function transforms an array with shape
(N, M)
to shape(N,)
for arbitrary integersN
andM
. These dimension variables look very similar toTypeVar
, ifTypeVar
supported integers as types. - A notion of "zero or more additional dimensions" would also be quite valuable, and is a core part of the type for many NumPy operations (generalized ufuncs). This might be naturally written with Ellipsis, e.g.,
(...., N)
for an array with a last dimension of lengthN
and any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.
This will likely require some new typing features (as well as type-checker support). Notably:
- Support for literal values (Check that literal strings/int/float belong to /is excluded from a set/range of values #478), so we can type check operations like
array.sum(axis=0)
. - Variadic generics (Allow variadic generics #193), we can write types like
NDArray[N]
andNDArray[N, M]
. - Some sort of support for dimension identity in shapes (e.g., integer types, or
DimensionVar
as described in my doc). - Standard syntax for writing array dtype/shape annotations: what should these look like?