Skip to content

Commit ad51404

Browse files
Change keep_attrs default to True (#10726)
* feat: Preserve attributes by default in all operations BREAKING CHANGE: Change keep_attrs default from False to True This changes the default behavior of xarray operations to preserve attributes by default, which better aligns with user expectations and scientific workflows where metadata preservation is critical. Migration guide: - To restore previous behavior globally: xr.set_options(keep_attrs=False) - To restore for specific operations: use keep_attrs=False parameter - Alternative: use .drop_attrs() method after operations Closes #3891, #4510, #9920 * Fix Dataset.map to properly handle coordinate attrs when keep_attrs=False The merge incorrectly preserved coordinate attributes even when keep_attrs=False. Now coordinates have their attrs cleared when keep_attrs=False, consistent with data variables. * Optimize Dataset.map coordinate attribute handling - When keep_attrs=True: restore attrs from original coords (func may have dropped them) - When keep_attrs=False: clear all attrs - More efficient than previous implementation * Simplify Dataset.map attribute handling code Group attribute operations by keep_attrs value for cleaner, more readable code with identical functionality. * Remove temporal 'now' references from comments Per Stefan's review, remove 'now' from comments that describe behavior changes, as these become stale over time. Replace with timeless descriptions that simply state the current behavior. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address remaining review comments from Stefan - Remove PR-specific comment from variable.py that only makes sense in context - Remove redundant comment from test_computation.py - Clarify comment in test_dataarray.py about argmin preserving attrs * Use drop_conflicts for binary operations attribute handling Instead of only preserving the left operand's attributes, binary operations now combine attributes from both operands using the drop_conflicts strategy: - Matching attributes (same key, same value) are kept - Conflicting attributes (same key, different values) are dropped - Non-conflicting attributes from both operands are preserved This provides more intuitive behavior when combining data with partially overlapping metadata. * Fix binary ops attrs: only merge when both operands have attrs For backward compatibility, when one operand has no attributes (None), keep the left operand's attributes instead of merging. This maintains the existing test expectations while still providing drop_conflicts behavior when both operands have attributes. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix binary ops attrs handling when operands have no attrs When one operand has no attributes (either None or empty dict), the result should have no attributes. This maintains backward compatibility and fixes test_1d_math failures. The issue was compounded by a side effect in the attrs property getter that mutates _attrs from None to {} when accessed. * Simplify binary ops attrs handling Use attrs property directly instead of checking _attrs, since the property normalizes None to {}. This simplifies the logic while maintaining the same behavior. * Clarify comments about attrs handling differences Variable uses None for no attrs, Dataset uses {} for no attrs. Updated comments to make this distinction clear. * Implement true drop_conflicts behavior for binary operations Previously we were dropping all attrs if either operand had no attrs. Now we properly merge attrs and only drop conflicting ones, which is what drop_conflicts should do: - If one has {"a": 1} and other has {}, result is {"a": 1} - If one has {"a": 1} and other has {"a": 2}, result is {} - If one has {"a": 1} and other has {"b": 2}, result is {"a": 1, "b": 2} Updated tests to reflect this correct behavior. * Remove unnecessary conversion of {} to None Variable constructor already normalizes empty dict to None internally, so the explicit conversion is redundant. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 05adfa6 commit ad51404

20 files changed

+450
-139
lines changed

doc/whats-new.rst

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,92 @@ New Features
9393
Breaking changes
9494
~~~~~~~~~~~~~~~~
9595

96+
- **All xarray operations now preserve attributes by default** (:issue:`3891`, :issue:`2582`).
97+
Previously, operations would drop attributes unless explicitly told to preserve them via ``keep_attrs=True``.
98+
Additionally, when attributes are preserved in binary operations, they now combine attributes from both
99+
operands using ``drop_conflicts`` (keeping matching attributes, dropping conflicts), instead of keeping
100+
only the left operand's attributes.
101+
102+
**What changed:**
103+
104+
.. code-block:: python
105+
106+
# Before (xarray <2025.09.1):
107+
data = xr.DataArray([1, 2, 3], attrs={"units": "meters", "long_name": "height"})
108+
result = data.mean()
109+
result.attrs # {} - Attributes lost!
110+
111+
# After (xarray ≥2025.09.1):
112+
data = xr.DataArray([1, 2, 3], attrs={"units": "meters", "long_name": "height"})
113+
result = data.mean()
114+
result.attrs # {"units": "meters", "long_name": "height"} - Attributes preserved!
115+
116+
**Affected operations include:**
117+
118+
*Computational operations:*
119+
120+
- Reductions: ``mean()``, ``sum()``, ``std()``, ``var()``, ``min()``, ``max()``, ``median()``, ``quantile()``, etc.
121+
- Rolling windows: ``rolling().mean()``, ``rolling().sum()``, etc.
122+
- Groupby: ``groupby().mean()``, ``groupby().sum()``, etc.
123+
- Resampling: ``resample().mean()``, etc.
124+
- Weighted: ``weighted().mean()``, ``weighted().sum()``, etc.
125+
- ``apply_ufunc()`` and NumPy universal functions
126+
127+
*Binary operations:*
128+
129+
- Arithmetic: ``+``, ``-``, ``*``, ``/``, ``**``, ``//``, ``%`` (combines attributes using ``drop_conflicts``)
130+
- Comparisons: ``<``, ``>``, ``==``, ``!=``, ``<=``, ``>=`` (combines attributes using ``drop_conflicts``)
131+
- With scalars: ``data * 2``, ``10 - data`` (preserves data's attributes)
132+
133+
*Data manipulation:*
134+
135+
- Missing data: ``fillna()``, ``dropna()``, ``interpolate_na()``, ``ffill()``, ``bfill()``
136+
- Indexing/selection: ``isel()``, ``sel()``, ``where()``, ``clip()``
137+
- Alignment: ``interp()``, ``reindex()``, ``align()``
138+
- Transformations: ``map()``, ``pipe()``, ``assign()``, ``assign_coords()``
139+
- Shape operations: ``expand_dims()``, ``squeeze()``, ``transpose()``, ``stack()``, ``unstack()``
140+
141+
**Binary operations - combines attributes with ``drop_conflicts``:**
142+
143+
.. code-block:: python
144+
145+
a = xr.DataArray([1, 2], attrs={"units": "m", "source": "sensor_a"})
146+
b = xr.DataArray([3, 4], attrs={"units": "m", "source": "sensor_b"})
147+
(a + b).attrs # {"units": "m"} - Matching values kept, conflicts dropped
148+
(b + a).attrs # {"units": "m"} - Order doesn't matter for drop_conflicts
149+
150+
**How to restore previous behavior:**
151+
152+
1. **Globally for your entire script:**
153+
154+
.. code-block:: python
155+
156+
import xarray as xr
157+
158+
xr.set_options(keep_attrs=False) # Affects all subsequent operations
159+
160+
2. **For specific operations:**
161+
162+
.. code-block:: python
163+
164+
result = data.mean(dim="time", keep_attrs=False)
165+
166+
3. **For code blocks:**
167+
168+
.. code-block:: python
169+
170+
with xr.set_options(keep_attrs=False):
171+
# All operations in this block drop attrs
172+
result = data1 + data2
173+
174+
4. **Remove attributes after operations:**
175+
176+
.. code-block:: python
177+
178+
result = data.mean().drop_attrs()
179+
180+
By `Maximilian Roos <https://github.com/max-sixty>`_.
181+
96182
- :py:meth:`Dataset.update` now returns ``None``, instead of the updated dataset. This
97183
completes the deprecation cycle started in version 0.17. The method still updates the
98184
dataset in-place. (:issue:`10167`)

xarray/computation/apply_ufunc.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1214,7 +1214,7 @@ def apply_ufunc(
12141214
func = functools.partial(func, **kwargs)
12151215

12161216
if keep_attrs is None:
1217-
keep_attrs = _get_keep_attrs(default=False)
1217+
keep_attrs = _get_keep_attrs(default=True)
12181218

12191219
if isinstance(keep_attrs, bool):
12201220
keep_attrs = "override" if keep_attrs else "drop"

xarray/computation/computation.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -701,7 +701,7 @@ def where(cond, x, y, keep_attrs=None):
701701
* lon (lon) int64 24B 10 11 12
702702
703703
>>> xr.where(y.lat < 1, y, -1)
704-
<xarray.DataArray (lat: 3, lon: 3)> Size: 72B
704+
<xarray.DataArray 'lat' (lat: 3, lon: 3)> Size: 72B
705705
array([[ 0. , 0.1, 0.2],
706706
[-1. , -1. , -1. ],
707707
[-1. , -1. , -1. ]])
@@ -726,7 +726,7 @@ def where(cond, x, y, keep_attrs=None):
726726
from xarray.core.dataset import Dataset
727727

728728
if keep_attrs is None:
729-
keep_attrs = _get_keep_attrs(default=False)
729+
keep_attrs = _get_keep_attrs(default=True)
730730

731731
# alignment for three arguments is complicated, so don't support it yet
732732
from xarray.computation.apply_ufunc import apply_ufunc

xarray/computation/weighted.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -448,7 +448,6 @@ def _weighted_quantile_1d(
448448

449449
result = result.transpose("quantile", ...)
450450
result = result.assign_coords(quantile=q).squeeze()
451-
452451
return result
453452

454453
def _implementation(self, func, dim, **kwargs):
@@ -551,7 +550,6 @@ def _implementation(self, func, dim, **kwargs) -> DataArray:
551550
class DatasetWeighted(Weighted["Dataset"]):
552551
def _implementation(self, func, dim, **kwargs) -> Dataset:
553552
self._check_dim(dim)
554-
555553
return self.obj.map(func, dim=dim, **kwargs)
556554

557555

xarray/core/common.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1314,7 +1314,7 @@ def isnull(self, keep_attrs: bool | None = None) -> Self:
13141314
from xarray.computation.apply_ufunc import apply_ufunc
13151315

13161316
if keep_attrs is None:
1317-
keep_attrs = _get_keep_attrs(default=False)
1317+
keep_attrs = _get_keep_attrs(default=True)
13181318

13191319
return apply_ufunc(
13201320
duck_array_ops.isnull,
@@ -1357,7 +1357,7 @@ def notnull(self, keep_attrs: bool | None = None) -> Self:
13571357
from xarray.computation.apply_ufunc import apply_ufunc
13581358

13591359
if keep_attrs is None:
1360-
keep_attrs = _get_keep_attrs(default=False)
1360+
keep_attrs = _get_keep_attrs(default=True)
13611361

13621362
return apply_ufunc(
13631363
duck_array_ops.notnull,

xarray/core/dataarray.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3889,8 +3889,8 @@ def reduce(
38893889
supplied, then the reduction is calculated over the flattened array
38903890
(by calling `f(x)` without an axis argument).
38913891
keep_attrs : bool or None, optional
3892-
If True, the variable's attributes (`attrs`) will be copied from
3893-
the original object to the new one. If False (default), the new
3892+
If True (default), the variable's attributes (`attrs`) will be copied from
3893+
the original object to the new one. If False, the new
38943894
object will be returned without attributes.
38953895
keepdims : bool, default: False
38963896
If True, the dimensions which are reduced are left in the result

xarray/core/dataset.py

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6780,8 +6780,8 @@ def reduce(
67806780
Dimension(s) over which to apply `func`. By default `func` is
67816781
applied over all dimensions.
67826782
keep_attrs : bool or None, optional
6783-
If True, the dataset's attributes (`attrs`) will be copied from
6784-
the original object to the new one. If False (default), the new
6783+
If True (default), the dataset's attributes (`attrs`) will be copied from
6784+
the original object to the new one. If False, the new
67856785
object will be returned without attributes.
67866786
keepdims : bool, default: False
67876787
If True, the dimensions which are reduced are left in the result
@@ -6839,7 +6839,7 @@ def reduce(
68396839
dims = parse_dims_as_set(dim, set(self._dims.keys()))
68406840

68416841
if keep_attrs is None:
6842-
keep_attrs = _get_keep_attrs(default=False)
6842+
keep_attrs = _get_keep_attrs(default=True)
68436843

68446844
variables: dict[Hashable, Variable] = {}
68456845
for name, var in self._variables.items():
@@ -6930,7 +6930,7 @@ def map(
69306930
bar (x) float64 16B 1.0 2.0
69316931
"""
69326932
if keep_attrs is None:
6933-
keep_attrs = _get_keep_attrs(default=False)
6933+
keep_attrs = _get_keep_attrs(default=True)
69346934
variables = {
69356935
k: maybe_wrap_array(v, func(v, *args, **kwargs))
69366936
for k, v in self.data_vars.items()
@@ -6943,11 +6943,14 @@ def map(
69436943
if keep_attrs:
69446944
for k, v in variables.items():
69456945
v._copy_attrs_from(self.data_vars[k])
6946-
69476946
for k, v in coords.items():
6948-
if k not in self.coords:
6949-
continue
6950-
v._copy_attrs_from(self.coords[k])
6947+
if k in self.coords:
6948+
v._copy_attrs_from(self.coords[k])
6949+
else:
6950+
for v in variables.values():
6951+
v.attrs = {}
6952+
for v in coords.values():
6953+
v.attrs = {}
69516954

69526955
attrs = self.attrs if keep_attrs else None
69536956
return type(self)(variables, coords=coords, attrs=attrs)
@@ -7678,9 +7681,14 @@ def _binary_op(self, other, f, reflexive=False, join=None) -> Dataset:
76787681
self, other = align(self, other, join=align_type, copy=False)
76797682
g = f if not reflexive else lambda x, y: f(y, x)
76807683
ds = self._calculate_binary_op(g, other, join=align_type)
7681-
keep_attrs = _get_keep_attrs(default=False)
7684+
keep_attrs = _get_keep_attrs(default=True)
76827685
if keep_attrs:
7683-
ds.attrs = self.attrs
7686+
# Combine attributes from both operands, dropping conflicts
7687+
from xarray.structure.merge import merge_attrs
7688+
7689+
self_attrs = self.attrs
7690+
other_attrs = getattr(other, "attrs", {})
7691+
ds.attrs = merge_attrs([self_attrs, other_attrs], "drop_conflicts")
76847692
return ds
76857693

76867694
def _inplace_binary_op(self, other, f) -> Self:
@@ -8274,7 +8282,7 @@ def quantile(
82748282
coord_names = {k for k in self.coords if k in variables}
82758283
indexes = {k: v for k, v in self._indexes.items() if k in variables}
82768284
if keep_attrs is None:
8277-
keep_attrs = _get_keep_attrs(default=False)
8285+
keep_attrs = _get_keep_attrs(default=True)
82788286
attrs = self.attrs if keep_attrs else None
82798287
new = self._replace_with_new_dims(
82808288
variables, coord_names=coord_names, attrs=attrs, indexes=indexes
@@ -8336,7 +8344,7 @@ def rank(
83368344

83378345
coord_names = set(self.coords)
83388346
if keep_attrs is None:
8339-
keep_attrs = _get_keep_attrs(default=False)
8347+
keep_attrs = _get_keep_attrs(default=True)
83408348
attrs = self.attrs if keep_attrs else None
83418349
return self._replace(variables, coord_names, attrs=attrs)
83428350

xarray/core/datatree.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -431,7 +431,7 @@ def map( # type: ignore[override]
431431
# Copied from xarray.Dataset so as not to call type(self), which causes problems (see https://github.com/xarray-contrib/datatree/issues/188).
432432
# TODO Refactor xarray upstream to avoid needing to overwrite this.
433433
if keep_attrs is None:
434-
keep_attrs = _get_keep_attrs(default=False)
434+
keep_attrs = _get_keep_attrs(default=True)
435435
variables = {
436436
k: maybe_wrap_array(v, func(v, *args, **kwargs))
437437
for k, v in self.data_vars.items()

xarray/core/variable.py

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1741,8 +1741,8 @@ def reduce( # type: ignore[override]
17411741
the reduction is calculated over the flattened array (by calling
17421742
`func(x)` without an axis argument).
17431743
keep_attrs : bool, optional
1744-
If True, the variable's attributes (`attrs`) will be copied from
1745-
the original object to the new one. If False (default), the new
1744+
If True (default), the variable's attributes (`attrs`) will be copied from
1745+
the original object to the new one. If False, the new
17461746
object will be returned without attributes.
17471747
keepdims : bool, default: False
17481748
If True, the dimensions which are reduced are left in the result
@@ -1757,7 +1757,7 @@ def reduce( # type: ignore[override]
17571757
removed.
17581758
"""
17591759
keep_attrs_ = (
1760-
_get_keep_attrs(default=False) if keep_attrs is None else keep_attrs
1760+
_get_keep_attrs(default=True) if keep_attrs is None else keep_attrs
17611761
)
17621762

17631763
# Note that the call order for Variable.mean is
@@ -2009,7 +2009,7 @@ def quantile(
20092009
_quantile_func = duck_array_ops.quantile
20102010

20112011
if keep_attrs is None:
2012-
keep_attrs = _get_keep_attrs(default=False)
2012+
keep_attrs = _get_keep_attrs(default=True)
20132013

20142014
scalar = utils.is_scalar(q)
20152015
q = np.atleast_1d(np.asarray(q, dtype=np.float64))
@@ -2350,7 +2350,7 @@ def isnull(self, keep_attrs: bool | None = None):
23502350
from xarray.computation.apply_ufunc import apply_ufunc
23512351

23522352
if keep_attrs is None:
2353-
keep_attrs = _get_keep_attrs(default=False)
2353+
keep_attrs = _get_keep_attrs(default=True)
23542354

23552355
return apply_ufunc(
23562356
duck_array_ops.isnull,
@@ -2384,7 +2384,7 @@ def notnull(self, keep_attrs: bool | None = None):
23842384
from xarray.computation.apply_ufunc import apply_ufunc
23852385

23862386
if keep_attrs is None:
2387-
keep_attrs = _get_keep_attrs(default=False)
2387+
keep_attrs = _get_keep_attrs(default=True)
23882388

23892389
return apply_ufunc(
23902390
duck_array_ops.notnull,
@@ -2435,8 +2435,17 @@ def _binary_op(self, other, f, reflexive=False):
24352435
other_data, self_data, dims = _broadcast_compat_data(other, self)
24362436
else:
24372437
self_data, other_data, dims = _broadcast_compat_data(self, other)
2438-
keep_attrs = _get_keep_attrs(default=False)
2439-
attrs = self._attrs if keep_attrs else None
2438+
keep_attrs = _get_keep_attrs(default=True)
2439+
if keep_attrs:
2440+
# Combine attributes from both operands, dropping conflicts
2441+
from xarray.structure.merge import merge_attrs
2442+
2443+
# Access attrs property to normalize None to {} due to property side effect
2444+
self_attrs = self.attrs
2445+
other_attrs = getattr(other, "attrs", {})
2446+
attrs = merge_attrs([self_attrs, other_attrs], "drop_conflicts")
2447+
else:
2448+
attrs = None
24402449
with np.errstate(all="ignore"):
24412450
new_data = (
24422451
f(self_data, other_data) if not reflexive else f(other_data, self_data)
@@ -2526,7 +2535,7 @@ def _unravel_argminmax(
25262535
}
25272536

25282537
if keep_attrs is None:
2529-
keep_attrs = _get_keep_attrs(default=False)
2538+
keep_attrs = _get_keep_attrs(default=True)
25302539
if keep_attrs:
25312540
for v in result.values():
25322541
v.attrs = self.attrs

xarray/tests/test_coarsen.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ def test_coarsen_keep_attrs(funcname, argument) -> None:
100100
attrs=global_attrs,
101101
)
102102

103-
# attrs are now kept per default
103+
# attrs are kept by default
104104
func = getattr(ds.coarsen(dim={"coord": 5}), funcname)
105105
result = func(*argument)
106106
assert result.attrs == global_attrs
@@ -199,7 +199,7 @@ def test_coarsen_da_keep_attrs(funcname, argument) -> None:
199199
name="name",
200200
)
201201

202-
# attrs are now kept per default
202+
# attrs are kept by default
203203
func = getattr(da.coarsen(dim={"coord": 5}), funcname)
204204
result = func(*argument)
205205
assert result.attrs == attrs_da

0 commit comments

Comments
 (0)