Skip to content

New defaults for concat, merge, combine_* #10062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 58 commits into from
Aug 1, 2025

Conversation

jsignell
Copy link
Contributor

@jsignell jsignell commented Feb 19, 2025

Replaces #10051

This PR attempts to throw warnings if and only if the output would change with the new kwarg defaults. To exercise the new default I am toggling the option back and forth and running the existing test suite.

With new kwarg values

  • Run all the tests with use_new_combine_kwarg_defaults=True -> 78 failed
  • Change to use_new_combine_kwarg_defaults=False and run the last failed -> 76 failed, 2 passed

Those 2 are missed alarms. In these two cases the behavior will be different when xarray switched to using new default kwarg values and there is no warning. But I am not totally sure whether we need to handle them because they are tests for conditions that used to raise an error and with the new defaults they do not.

FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_do_not_promote - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_error - Failed: DID NOT RAISE <class 'xarray.core.merge.MergeError'>

With old kwarg values

  • Run all the tests with use_new_combine_kwarg_defaults=False -> 119 failed
  • Change to use_new_combine_kwarg_defaults=True and run the last failed -> 76 failed, 43 passed

Those 44 are potentially false alarms - they could also be tests that just don't assert that the result exactly matches some fixed expectation - but mostly they are false alarms.

Here is a list of them
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[outer-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[outer-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[inner-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[inner-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[left-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[left-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[right-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[right-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[drop] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[override] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[no_conflicts] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[identical] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[drop_conflicts] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_attr_by_coords - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataarray_attr_by_coords - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_common_coord_when_datavars_minimal - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_invalid_data_vars_value_should_fail - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestDask::test_open_mfdataset_concat_dim_none - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestDask::test_open_mfdataset_concat_dim_default_none - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::test_h5netcdf_storage_options - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestNestedCombine::test_auto_combine_2d - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestNestedCombine::test_auto_combine_2d_combine_attrs_kwarg - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestCombineDatasetsbyCoords::test_infer_order_from_coords - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestCombineDatasetsbyCoords::test_combine_by_coords_no_concat - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim1-minimal] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim1-all] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_errors - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_concat.py::TestConcatDataArray::test_concat_avoids_index_auto_creation - FutureWarning: In a future version of xarray the default value for coords will change from coords='different' to coords='minimal'. This is likely to lead to different results when multiple datasets have m...
FAILED xarray/tests/test_dask.py::test_map_blocks_da_ds_with_template[obj1] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_dask.py::test_map_blocks_errors_bad_template[obj1] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_dataset.py::TestDataset::test_dataset_math_auto_align - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_datasets - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs10-attrs20-expected_attrs0-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs11-attrs21-expected_attrs1-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs12-attrs22-expected_attrs2-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs13-attrs23-expected_attrs3-True] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[drop-attrs14-attrs24-expected_attrs4-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[identical-attrs15-attrs25-expected_attrs5-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[identical-attrs16-attrs26-expected_attrs6-True] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[override-attrs17-attrs27-expected_attrs7-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[drop_conflicts-attrs18-attrs28-expected_attrs8-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[<lambda>-attrs19-attrs29-expected_attrs9-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_no_conflicts_preserve_attrs - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...

About half of them are triggered by my attempt to catch cases where different datasets have matching overlapping variables and with compat='no_conflicts' you might get different results than with compat='override'. There might be a better way, but we want to be careful to avoid calling compute.

Updating tests

After investigating the impact on existing tests, I updated the tests to make sure they still test what they were meant to test by passing in any required kwargs and I added new tests that explicitly ensure that for a bunch of different inputs, using old defaults throws a warning OR output is the same with new and old defaults.

Notes

  • I used sentinel values to indicate a kwarg value that the user has not explicitly set. The benefit of using this approach is that it makes it more obvious that people should not be setting these kwargs to None in their own code. Also with this approach it is possible to encode both the old and the new default value in the function signature.
  • When users have opted in to the new default values there is logic that tries to catch any errors that might be related to the new default kwargs and adds that context to the error message. In case people don't pay attention to the warnings this will at least give them some indication that things have changed and explain new errors that crop up. This only applies when there is already an error, so they don't catch places where the code succeeds but the result is different than before.

coords="different",
compat="equals",
join="outer",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hard-coded these to the old defaults since there is no way for the user to set them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later

@jsignell jsignell force-pushed the concat_default_kwargs branch from 0e65034 to 5461a9f Compare February 24, 2025 20:07
@jsignell
Copy link
Contributor Author

The last test file that I need to work on is test_concat.py

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

I'd like to avoid adding the extra option. Can you remind us of what that would be useful please?

coords="different",
compat="equals",
join="outer",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later

assert_identical(ds2, actual)

actual = ds2.merge(ds1)
actual = ds2.merge(ds1, compat="override")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is dangerous, the dimensionality of 'x' in the output depends on the order of merging.

I intended compat="override" to be used when "x" is the same in all datasets, not merely 'compatible' as is here. It seems like we would want to at minimum, assert that ndim is same for all x in all datasets.

Opened #10094

@jsignell
Copy link
Contributor Author

jsignell commented Mar 4, 2025

I'd like to avoid adding the extra option. Can you remind us of what that would be useful please?

The option is part of the deprecation process. Basically how it works is in this PR the default values for these kwargs do not change. BUT you get FutureWarnings if the new default values would lead to a different result or an error. The use_new_combine_kwarg_defaults (this absolutely does not have to be the final name) allows you to opt into the new defaults prematurely. Setting that option to True lets you encounter any issues that might arise and it turns off the FutureWarnings.

This is how I'm thinking about the plan after this PR lands:

  1. after a sufficiently long deprecation cycle there should be another PR that:
    • removes the option (but (probably) doesn't throw if people are still setting it)
    • removes all the FutureWarnings
  2. after even more time one last PR that:
    • encodes the new defaults right into the function signature
    • removes the extra context about new defaults that is included in some error messages

I guess you could get rid of the option the tradeoff is that to get rid of the FutureWarnings people may have to hardcode to the new defaults. But there aren't too many false alarms (I need to update the PR description UPDATED) so most of the time when there is a FutureWarning it does indicate that hardcoding the kwarg value is a good idea.

@jsignell jsignell marked this pull request as ready for review March 4, 2025 21:01
@jsignell
Copy link
Contributor Author

jsignell commented Mar 4, 2025

This is ready for review! The failing test looks like it is also failing on main.

@jsignell
Copy link
Contributor Author

jsignell commented Mar 7, 2025

Wellll maybe the unit tests will pass now. I'll fix mypy and the doctests next week.

@jsignell
Copy link
Contributor Author

The mypy failures now match those in other PRs - I opened #10110 to track

@dcherian dcherian force-pushed the concat_default_kwargs branch from a45ff5f to f0eab2e Compare July 27, 2025 04:21
- ``coords``: "minimal"
- ``compat``: "override"
- ``join``: "exact"

By `Julia Signell <https://github.com/jsignell>`_.
(:issue:`8778`, :issue:`1385`, :pull:`10062`). By `Julia Signell <https://github.com/jsignell>`_.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add yourself as an author on this one 😅 @dcherian

Copy link
Contributor Author

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might just be a few cleanups left with data_vars=None as the default. I can push if that is helpful, just don't want to step on your toes if you are still going :)

@dcherian
Copy link
Contributor

I can push if that is helpful,

Absolutely, please go ahead.

I wasn't sure that you were still interested in pushing this over line given how long this PR has been open :) . I can review again when you're done

@jsignell
Copy link
Contributor Author

Hahah yeah still interested! I just didn't know what to do to make it palatable :) so I'm stoked to have your review!

@jsignell
Copy link
Contributor Author

Ok! I pushed the changes I was talking about and merged in main. I think the flaky tests are just being flaky 🤷🏻

@dcherian
Copy link
Contributor

I just didn't know what to do to make it palatable :)

Hehe, it's just a hard complex change. I had to check it out and use a debugger on a few tests to fully grok things, which then led to the fixes I pushed.

@jsignell
Copy link
Contributor Author

I just didn't know what to do to make it palatable :)

Hehe, it's just a hard complex change. I had to check it out and use a debugger on a few tests to fully grok things, which then led to the fixes I pushed.

Oh totally! People should take their time to feel comfortable with it.

@dcherian
Copy link
Contributor

Thanks @jsignell We've agreed to move forward here.

One last thing to do is to error if data_vars="minimal", coords="minimal", dim="new_dim". When we are stacking over a new dimension, these "minmal" options will just pick the first object. This is almost certainly unwanted behaviour.

Can you add an error message and test case for that please?

@jsignell
Copy link
Contributor Author

I just did an experiment where I flipped the switch on use_new_combine_kwarg_defaults to make sure that tests act as expected. These are the results:

========================================================================== short test summary info ==========================================================================
FAILED xarray/tests/test_combine.py::TestNestedCombine::test_nested_merge_with_overlapping_values - Failed: DID NOT WARN. No warnings of type (<class 'FutureWarning'>,) were emitted.
FAILED xarray/tests/test_combine.py::TestNestedCombine::test_nested_merge_with_nan_no_conflicts - Failed: DID NOT WARN. No warnings of type (<class 'FutureWarning'>,) were emitted.
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_do_not_promote - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_dask.py::TestDataArrayAndDataset::test_concat_loads_variables - assert 6 == 12
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_concat_with_default_coords_warns - Failed: DID NOT WARN. No warnings of type (<class 'FutureWarning'>,) were emitted.
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs13-attrs23-expected_attrs3-True] - Failed: DID NOT WARN. No warnings of type (<class 'FutureWarning'>,) were emitted.
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[identical-attrs16-attrs26-expected_attrs6-True] - Failed: DID NOT WARN. No warnings of type (<class 'FutureWarning'>,) were emitted.
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_error - Failed: DID NOT RAISE <class 'xarray.structure.merge.MergeError'>
FAILED xarray/tests/test_merge.py::TestMergeMethod::test_merge - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_merge.py::TestMergeMethod::test_merge_drop_attrs - KeyError: 'foo'
==================================== 10 failed, 19212 passed, 1041 skipped, 195 xfailed, 22 xpassed, 1595 warnings in 381.25s (0:06:21) =====================================

We don't need to worry about the ones that don't trigger warnings anymore because that is expected and similarly I think the ones that don't raise anymore are likely fine as well. So that just leaves this one:

FAILED xarray/tests/test_dask.py::TestDataArrayAndDataset::test_concat_loads_variables - assert 6 == 12

Which is counting the number of function calls, so I am not concerned. And:

FAILED xarray/tests/test_merge.py::TestMergeMethod::test_merge_drop_attrs - KeyError: 'foo'

Which is alarming! Here is the test:

def test_merge_drop_attrs(self):
    data = create_test_data()
    ds1 = data[["var1"]]
    ds2 = data[["var3"]]
    ds1.coords["dim2"].attrs["keep me"] = "example"
    ds2.coords["numbers"].attrs["foo"] = "bar"
    actual = ds1.merge(ds2, combine_attrs="drop")
    assert actual.coords["dim2"].attrs == {}
    assert actual.coords["numbers"].attrs == {}
    assert ds1.coords["dim2"].attrs["keep me"] == "example"
    assert ds2.coords["numbers"].attrs["foo"] == "bar"

To me that looks like the attrs in the original ds2 are being altered by the merge 😬

@dcherian
Copy link
Contributor

dcherian commented Aug 1, 2025

Which is counting the number of function calls, so I am not concerned.

Yes, that's good I think. We are moving away from loading data implicitly.

To me that looks like the attrs in the original ds2 are being altered by the merge

Yikes, yes. Let's fix that in a different PR.

@dcherian
Copy link
Contributor

dcherian commented Aug 1, 2025

Amazing work, Julia. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants