-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New defaults for concat
, merge
, combine_*
#10062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hard-coded these to the old defaults since there is no way for the user to set them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later
0e65034
to
5461a9f
Compare
The last test file that I need to work on is test_concat.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on!
I'd like to avoid adding the extra option. Can you remind us of what that would be useful please?
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later
xarray/tests/test_merge.py
Outdated
assert_identical(ds2, actual) | ||
|
||
actual = ds2.merge(ds1) | ||
actual = ds2.merge(ds1, compat="override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is dangerous, the dimensionality of 'x'
in the output depends on the order of merging.
I intended compat="override"
to be used when "x"
is the same in all datasets, not merely 'compatible' as is here. It seems like we would want to at minimum, assert that ndim
is same for all x
in all datasets.
Opened #10094
The option is part of the deprecation process. Basically how it works is in this PR the default values for these kwargs do not change. BUT you get This is how I'm thinking about the plan after this PR lands:
I guess you could get rid of the option the tradeoff is that to get rid of the |
This is ready for review! The failing test looks like it is also failing on main. |
Wellll maybe the unit tests will pass now. I'll fix mypy and the doctests next week. |
The mypy failures now match those in other PRs - I opened #10110 to track |
a45ff5f
to
f0eab2e
Compare
- ``coords``: "minimal" | ||
- ``compat``: "override" | ||
- ``join``: "exact" | ||
|
||
By `Julia Signell <https://github.com/jsignell>`_. | ||
(:issue:`8778`, :issue:`1385`, :pull:`10062`). By `Julia Signell <https://github.com/jsignell>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably add yourself as an author on this one 😅 @dcherian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might just be a few cleanups left with data_vars=None
as the default. I can push if that is helpful, just don't want to step on your toes if you are still going :)
Absolutely, please go ahead. I wasn't sure that you were still interested in pushing this over line given how long this PR has been open :) . I can review again when you're done |
Hahah yeah still interested! I just didn't know what to do to make it palatable :) so I'm stoked to have your review! |
Ok! I pushed the changes I was talking about and merged in main. I think the flaky tests are just being flaky 🤷🏻 |
Hehe, it's just a hard complex change. I had to check it out and use a debugger on a few tests to fully grok things, which then led to the fixes I pushed. |
Oh totally! People should take their time to feel comfortable with it. |
Thanks @jsignell We've agreed to move forward here. One last thing to do is to error if Can you add an error message and test case for that please? |
I just did an experiment where I flipped the switch on
We don't need to worry about the ones that don't trigger warnings anymore because that is expected and similarly I think the ones that don't raise anymore are likely fine as well. So that just leaves this one:
Which is counting the number of function calls, so I am not concerned. And:
Which is alarming! Here is the test: def test_merge_drop_attrs(self):
data = create_test_data()
ds1 = data[["var1"]]
ds2 = data[["var3"]]
ds1.coords["dim2"].attrs["keep me"] = "example"
ds2.coords["numbers"].attrs["foo"] = "bar"
actual = ds1.merge(ds2, combine_attrs="drop")
assert actual.coords["dim2"].attrs == {}
assert actual.coords["numbers"].attrs == {}
assert ds1.coords["dim2"].attrs["keep me"] == "example"
assert ds2.coords["numbers"].attrs["foo"] == "bar" To me that looks like the attrs in the original ds2 are being altered by the merge 😬 |
Yes, that's good I think. We are moving away from loading data implicitly.
Yikes, yes. Let's fix that in a different PR. |
Amazing work, Julia. ❤️ |
For reference see: pydata/xarray#10062
Replaces #10051
FutureWarnings
whats-new.rst
New functions/methods are listed inapi.rst
This PR attempts to throw warnings if and only if the output would change with the new kwarg defaults. To exercise the new default I am toggling the option back and forth and running the existing test suite.
With new kwarg values
use_new_combine_kwarg_defaults=True
-> 78 faileduse_new_combine_kwarg_defaults=False
and run the last failed -> 76 failed, 2 passedThose 2 are missed alarms. In these two cases the behavior will be different when xarray switched to using new default kwarg values and there is no warning. But I am not totally sure whether we need to handle them because they are tests for conditions that used to raise an error and with the new defaults they do not.
With old kwarg values
use_new_combine_kwarg_defaults=False
-> 119 faileduse_new_combine_kwarg_defaults=True
and run the last failed -> 76 failed, 43 passedThose 44 are potentially false alarms - they could also be tests that just don't assert that the result exactly matches some fixed expectation - but mostly they are false alarms.
Here is a list of them
About half of them are triggered by my attempt to catch cases where different datasets have matching overlapping variables and with
compat='no_conflicts'
you might get different results than withcompat='override'
. There might be a better way, but we want to be careful to avoid calling compute.Updating tests
After investigating the impact on existing tests, I updated the tests to make sure they still test what they were meant to test by passing in any required kwargs and I added new tests that explicitly ensure that for a bunch of different inputs, using old defaults throws a warning OR output is the same with new and old defaults.
Notes