add kwarg to handle invalid files in open_mfdataset #9955

pratiman-91 · 2025-01-16T15:30:57Z

Closes better handling of invalid files in open_mfdataset #6736
User visible changes (including notable bug fixes) are documented in whats-new.rst

Added new argument in open_mfdataset to better handle the invalid files.

errors : {'ignore', 'raise', 'warn'}, default 'raise'
        - If 'raise', then invalid dataset will raise an exception.
        - If 'ignore', then invalid dataset will be ignored.
        - If 'warn', then a warning will be issued for each invalid dataset.

welcome · 2025-01-16T15:31:01Z

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

max-sixty · 2025-01-16T20:35:25Z

I'm not the expert, but this looks reasonable! Any other thoughts?

Assuming no one thinks it's a bad idea, we would need tests.

headtr1ck

I think it is a good idea.

But the way it is implemented here seems overly complicated and repetitive.
I would suggest to revert the logic: first build up the list wrapped in a single try and then handle the three cases in the except block.

xarray/backends/api.py

Co-authored-by: Michael Niklas <[email protected]>

headtr1ck

Almost there.

Also, we should add tests for this.

xarray/backends/api.py

pratiman-91 · 2025-01-19T10:16:28Z

@headtr1ck Thanks for the suggestions. I have added two tests (ignore and warn). Also, while testing, I found that a new argument broke combine="nested" due to invalid ids. I have now modified it to reflect the correct ids, and it is passing the tests. Please review the tests and the latest version.

xarray/backends/api.py

…d warn.

pratiman-91 · 2025-01-20T01:04:51Z

Hi @headtr1ck, I have been thinking about the handling of ids. Current version looks like a patch work (I am not happy with it.). I think we can create ids after removing all the invalid datasets from path1d within the combine==nested block. Please let me know what do you think.
Thanks!

pratiman-91 · 2025-03-31T02:06:15Z

@max-sixty Can you please go through the PR. Thanks!

max-sixty · 2025-03-31T18:20:59Z

I'm admittedly much less familiar with this section of the code. nothing seems wrong though!

I think we should bias towards merging, so if no one has concerns then I'd vote to merge

could we fix the errors in the docs?

xarray/backends/api.py

pratiman-91 · 2025-04-04T02:58:11Z

It seems like one test failed test_sparse_dask_dataset_repr (xarray.tests.test_sparse.TestSparseDataArrayAndDataset) . It is not related to this PR.

for more information, see https://pre-commit.ci

pratiman-91 · 2025-07-09T15:36:53Z

@headtr1ck

Some minor changes are still required.

I have made changes based on your suggestions.

Another question: what happens now if someone passes a e.g. 2x2 list of files where one is broken?

Because as far as I can tell, if errors="ignore" this file will be silently removed but then later on the dataset cannot be constructed and quite likely will throw an error that will confuse the user.

I agree, that would be the case. An important assumption is that removing the files does not affect the overall validity of the datasets. I think it should be up to the user to use that option.

for more information, see https://pre-commit.ci

pratiman-91 · 2025-08-06T13:45:16Z

@headtr1ck Can you please review this PR?
Thanks!

headtr1ck · 2025-08-07T19:07:28Z

You need to merge main and resolve the conflicts

for more information, see https://pre-commit.ci

kmuehlbauer · 2025-08-11T05:55:47Z

Another question: what happens now if someone passes a e.g. 2x2 list of files where one is broken?
Because as far as I can tell, if errors="ignore" this file will be silently removed but then later on the dataset cannot be constructed and quite likely will throw an error that will confuse the user.

I agree, that would be the case. An important assumption is that removing the files does not affect the overall validity of the datasets. I think it should be up to the user to use that option.

Thanks @pratiman-91 for the explanation. For cases where unrelated files sneak into the file list for some reason the enhancements in this PR would really help the user to just get open_mfdataset to work. Without having to examine the file list. Thanks @pratiman-91.

kmuehlbauer · 2025-08-11T05:58:22Z

I'm inclined to merge this, but unsure about the CI failures. I'll restart CI, let's see if this was just intermittent.

welcome · 2025-08-11T06:17:20Z

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again!

kmuehlbauer · 2025-08-11T06:24:35Z

Thanks @pratiman-91 for sticking with us and congrats to your first contribution!

pratiman-91 · 2025-08-11T07:22:13Z

@max-sixty @kmuehlbauer @headtr1ck Thank you very much for help. It was a nice experience and I learned a lot.

pratiman-91 and others added 2 commits January 16, 2025 23:06

GH6736

3ec575d

Updated whats-new.rst

5b95c21

headtr1ck requested changes Jan 16, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 2 commits January 17, 2025 10:53

Update xarray/backends/api.py

9249bf3

Co-authored-by: Michael Niklas <[email protected]>

Updated logic

1eb6422

headtr1ck reviewed Jan 18, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

headtr1ck added the topic-error reporting label Jan 18, 2025

Added tests and modifiede the logic to get correct ids for concat

8005e33

headtr1ck reviewed Jan 19, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 2 commits January 19, 2025 22:42

Added new tests and logic to handle 2x2 open_mfdataset with ignore an…

3bfaaee

…d warn.

pre-commit run

f621030

new logic to add nested paths

b9f04c8

pratiman-91 requested a review from headtr1ck February 16, 2025 05:00

max-sixty reviewed Mar 31, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 2 commits April 4, 2025 10:25

made remove_path a private function and updated whats-new.rst

0657014

Merge branch 'main' into open_mfdataset_enchancement

4dd6da4

pratiman-91 added 2 commits April 7, 2025 11:02

Merge branch 'main' into open_mfdataset_enchancement

232ab45

Merge branch 'main' into open_mfdataset_enchancement

1110a28

github-actions bot added topic-backends topic-documentation io labels Apr 14, 2025

pratiman-91 and others added 2 commits April 15, 2025 09:59

Updated whats-new.rst

ffc3c53

[pre-commit.ci] auto fixes from pre-commit.com hooks

efe1642

for more information, see https://pre-commit.ci

pratiman-91 and others added 6 commits July 9, 2025 12:09

set of invalid files in a set and remove them only once

2af2ce3

modified the logic to remove invalid files.

4241372

Merge branch 'main' into open_mfdataset_enchancement

645df1f

[pre-commit.ci] auto fixes from pre-commit.com hooks

8ff7593

for more information, see https://pre-commit.ci

import ing TypeVar

096a133

[pre-commit.ci] auto fixes from pre-commit.com hooks

87ebcf9

for more information, see https://pre-commit.ci

pratiman-91 and others added 4 commits July 9, 2025 16:38

making FLike private

229228e

fixing mypy errors

7929dd3

importing List from typing

1fbc34f

[pre-commit.ci] auto fixes from pre-commit.com hooks

cbdb290

for more information, see https://pre-commit.ci

pratiman-91 requested a review from headtr1ck July 9, 2025 16:15

dcherian removed the topic-documentation label Jul 11, 2025

headtr1ck approved these changes Aug 7, 2025

View reviewed changes

pratiman-91 and others added 5 commits August 11, 2025 13:16

Updated whats-new

8ea4241

remove whats-new

b131c59

[pre-commit.ci] auto fixes from pre-commit.com hooks

d395a47

for more information, see https://pre-commit.ci

Merge branch 'main' into open_mfdataset_enchancement

0b3ff33

Updated Whats-new.rst

fdea981

kmuehlbauer closed this Aug 11, 2025

kmuehlbauer reopened this Aug 11, 2025

kmuehlbauer enabled auto-merge (squash) August 11, 2025 05:59

kmuehlbauer merged commit 54ac2fe into pydata:main Aug 11, 2025
69 of 72 checks passed

Uh oh!

add kwarg to handle invalid files in open_mfdataset #9955

add kwarg to handle invalid files in open_mfdataset #9955

Uh oh!

Conversation

pratiman-91 commented Jan 16, 2025

Uh oh!

welcome bot commented Jan 16, 2025

Uh oh!

max-sixty commented Jan 16, 2025

Uh oh!

headtr1ck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

headtr1ck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pratiman-91 commented Jan 19, 2025

Uh oh!

Uh oh!

pratiman-91 commented Jan 20, 2025

Uh oh!

pratiman-91 commented Mar 31, 2025

Uh oh!

max-sixty commented Mar 31, 2025

Uh oh!

Uh oh!

pratiman-91 commented Apr 4, 2025

Uh oh!

pratiman-91 commented Jul 9, 2025

Uh oh!

pratiman-91 commented Aug 6, 2025

Uh oh!

headtr1ck commented Aug 7, 2025

Uh oh!

kmuehlbauer commented Aug 11, 2025

Uh oh!

kmuehlbauer commented Aug 11, 2025

Uh oh!

Uh oh!

welcome bot commented Aug 11, 2025

Uh oh!

kmuehlbauer commented Aug 11, 2025

Uh oh!

pratiman-91 commented Aug 11, 2025

Uh oh!

Uh oh!