Skip to content

[BugFix] Fix pre-1970 Parquet timestamp load corrupting sub-second DATETIME (backport #75207)#75385

Merged
wanpengfei-git merged 1 commit into
branch-4.1from
mergify/bp/branch-4.1/pr-75207
Jun 26, 2026
Merged

[BugFix] Fix pre-1970 Parquet timestamp load corrupting sub-second DATETIME (backport #75207)#75385
wanpengfei-git merged 1 commit into
branch-4.1from
mergify/bp/branch-4.1/pr-75207

Conversation

@mergify

@mergify mergify Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Why I'm doing:

When loading a Parquet INT64 column annotated TIMESTAMP (isAdjustedToUTC=false) into a StarRocks DATETIME, a pre-1970 value with a nonzero sub-second part was decoded to a corrupt garbage value instead of the real wall clock.

Int64ToDateTimeConverter::convert splits the signed epoch tick with C++ truncating division:

int64_t seconds     = src_data[i] / _second_mask;
int64_t nanoseconds = (src_data[i] % _second_mask) * _scale_to_nano_factor;

For a negative tick whose sub-second remainder is nonzero, nanoseconds is negative. timestamp::of_epoch_second then packs the result via a bitwise OR (from_julian_and_time), so the negative microsecond corrupts the packed Julian field — e.g. 1969-12-31 23:59:59.500 loaded as a year-41222 garbage value. Whole-second negatives and all post-1970 values were unaffected.

What I'm doing:

Borrow a whole second when the sub-second remainder is negative, so nanoseconds stays in [0, NANOSECS_PER_SEC) — the floor split the FE boundary computation already uses (Math.floorDiv/Math.floorMod). of_epoch_second then receives a non-negative sub-second and packs the correct value. The borrow is unit-agnostic (MILLIS/MICROS/NANOS) and runs before the UTC whole-second offset, so it composes with the timezone-adjusted branch unchanged.

Added a regression test (Int64PreEpochTimestampSubSecond) that loads a Parquet file holding 1969-12-31 23:59:59.500000 in both MILLIS and MICROS units: it decoded to a garbage value before the fix and to the correct wall clock after. The existing post-1970 Int64_2_Timestamp test is unchanged (14/14 ColumnConverterTest pass).

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

…TETIME (#75207)

Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 0f52fa0)
@wanpengfei-git wanpengfei-git merged commit 88df51c into branch-4.1 Jun 26, 2026
39 of 40 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-4.1/pr-75207 branch June 26, 2026 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants