[BugFix] Fix pre-1970 Parquet timestamp load corrupting sub-second DATETIME (backport #75207)#75385
Merged
Merged
Conversation
18 tasks
xiangguangyxg
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
When loading a Parquet
INT64column annotatedTIMESTAMP(isAdjustedToUTC=false) into a StarRocksDATETIME, a pre-1970 value with a nonzero sub-second part was decoded to a corrupt garbage value instead of the real wall clock.Int64ToDateTimeConverter::convertsplits the signed epoch tick with C++ truncating division:For a negative tick whose sub-second remainder is nonzero,
nanosecondsis negative.timestamp::of_epoch_secondthen packs the result via a bitwise OR (from_julian_and_time), so the negative microsecond corrupts the packed Julian field — e.g.1969-12-31 23:59:59.500loaded as a year-41222 garbage value. Whole-second negatives and all post-1970 values were unaffected.What I'm doing:
Borrow a whole second when the sub-second remainder is negative, so
nanosecondsstays in[0, NANOSECS_PER_SEC)— the floor split the FE boundary computation already uses (Math.floorDiv/Math.floorMod).of_epoch_secondthen receives a non-negative sub-second and packs the correct value. The borrow is unit-agnostic (MILLIS/MICROS/NANOS) and runs before the UTC whole-second offset, so it composes with the timezone-adjusted branch unchanged.Added a regression test (
Int64PreEpochTimestampSubSecond) that loads a Parquet file holding1969-12-31 23:59:59.500000in both MILLIS and MICROS units: it decoded to a garbage value before the fix and to the correct wall clock after. The existing post-1970Int64_2_Timestamptest is unchanged (14/14ColumnConverterTestpass).What type of PR is this:
Does this PR entail a change in behavior?
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request [BugFix] Fix pre-1970 Parquet timestamp load corrupting sub-second DATETIME #75207 done by Mergify.