fix: handle invalid byte ranges in calculate_range for single-line files #19607
+32
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
calculate_rangecreates invalid byte ranges for single-line JSON files #19605.Rationale for this change
The
calculate_rangefunction creates invalid byte ranges (wherestart > end) when reading single-line CSV/JSON files that are split into multiple partitions. This causes an error like:When
find_first_newlinedoesn't find a newline (single-line file), it returns the remaining file length. This causesstart + start_deltato exceedend + end_delta, creating an invalid range. The current check only handlesrange.start == range.end, notrange.start > range.end.What changes are included in this PR?
Added an early termination check after computing
start_delta: if the first newline afterstartis beyond the partition boundary (start + start_delta > end), returnTerminateEarlysince no complete records exist in this partition.Changed the final range validation from
==to>=as a safety net for edge cases.Added a regression test that reproduces the bug with a single-line file split into partitions.
Are these changes tested?
Yes, added
test_calculate_range_single_line_filewhich:calculate_rangereturnsTerminateEarlyinstead of an invalid rangeAre there any user-facing changes?
No