Skip to content

integration of marray into xarray #10574

@keewis

Description

@keewis

What is your issue?

During the scipy sprints this year I had a quick look at which parts of xarray already work with marray, via xarray's array API support. In following you'll see what I tried already.

Preamble for the examples below:

import xarray as xr
import marray
import numpy as np

xr.set_options(display_expand_data=False)

rng = np.random.default_rng()

What works already:

  • creating xarray objects containing masked arrays:
mnp = marray.masked_namespace(np)

data = rng.normal(size=(2000, 1000)) * 2 - 1
masked = mnp.asarray(data, mask=np.abs(data) < 0.5)
arr = xr.DataArray(
    masked,
    dims=("time", "x"),
    coords={"time": xr.date_range("2025-07-07 08:00:00", freq="6h", periods=2000), "x": np.arange(1000)},
)
  • aggregation (without nan-skipping, marray does that by default but doesn't implement __array_function__ for nan*): arr.mean(dim="x", skipna=False)
  • subsetting: arr.sel(time="2025-07-09")
  • reindexing: arr.reindex(x=np.arange(-5, 15), fill_value=mnp.asarray(0, mask=True)) (this wraps the data using marray, so won't work with dask. It does allow converting non-marray to marray data, though)
  • where: arr.where(xr.ufuncs.abs(arr) > 2.4, mnp.asarray(0, mask=True))
  • groupby aggregations: arr.groupby("time.day").mean(skipna=False)
  • stack: arr.stack(z=("time", "x"))
  • roll: arr.roll({"time": 3})

What does not work yet:

  • sortby: arr.sortby("time") (fails with a "cannot pickle module object")
  • pad / shift: arr.pad({"x": 2}, constant_values=mnp.asarray(0, mask=True)) (pad is not part of the array API, yet)
  • isnull: arr.isnull() (it's not quite clear whether the name refers to nan / nat / None, or to truly missing values)
  • na-filling methods like ffill / bfill / fillna and interpolate_na (we might need a masked accessor for that if we don't want to special-case marray)
  • string methods (only works on numpy arrays? But since the vlen string dtype in numpy 2 supports missing values it might not be needed?)

To make a lot of these a bit less of a mouthful we could make the default placeholder for missing values (NA) be aware of marray and represent a masked 0d marray, or add a missing global to marray's namespace such that we could pass e.g. mnp.missing to all places where fill_value crops up.

cc @dcherian, @mdhaber

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-arraysrelated to flexible array support

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions