Warn on concurrent partial shard writes to prevent silent data corruption #3683
+126
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
sharding.warn_on_partial_write(default:True)Problem
When multiple tasks write to different regions of the same shard concurrently, each task:
This read-modify-write pattern causes race conditions where earlier writes get silently overwritten, resulting in 30-50% data loss in parallel write scenarios (e.g., dask
to_zarrwith misaligned chunks).Related: pydata/xarray#10831, #3682
Solution
Track in-progress partial shard writes using a thread-safe counter per shard. When concurrent access is detected, emit a warning:
Key design decisions
zarr.config.set({'sharding.warn_on_partial_write': False})Test plan
🤖 Generated with Claude Code