Skip to content

[BugFix] Preserve sub-second when loading pre-1970 ORC TIMESTAMP (backport #75432)#75506

Merged
wanpengfei-git merged 1 commit into
branch-4.1from
mergify/bp/branch-4.1/pr-75432
Jun 29, 2026
Merged

[BugFix] Preserve sub-second when loading pre-1970 ORC TIMESTAMP (backport #75432)#75506
wanpengfei-git merged 1 commit into
branch-4.1from
mergify/bp/branch-4.1/pr-75432

Conversation

@mergify

@mergify mergify Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Why I'm doing:

OrcTimestampHelper::orc_ts_to_native_ts_before_unix_epoch (be/src/formats/orc/utils.h) hardcoded the microsecond argument to 0, so loading any pre-1970 ORC TIMESTAMP with a non-zero sub-second dropped the fraction, e.g. 1965-03-02 12:00:00.500000 was loaded as 1965-03-02 12:00:00.000000. It affects both the plain TIMESTAMP and the TIMESTAMP WITH LOCAL TIME ZONE (instant) read paths and is independent of tablet pre-split.

The after_unix_epoch path already carries the sub-second through (nanoseconds / 1000); only the before_unix_epoch branch threw it away. liborc hands the load path a clean, floored (seconds, nanoseconds) pair (nanoseconds in [0, 1e9), instant = data + nanoseconds/1e9): the writer's +1 (ColumnWriter.cc) and reader's -1 (ColumnReader.cc) for negative sub-second values cancel.

What I'm doing:

be/src/formats/orc/utils.h, in orc_ts_to_native_ts_before_unix_epoch: pass nanoseconds / NANOSECS_PER_USEC as the microsecond instead of 0, mirroring the unguarded after_unix_epoch sibling. The row load path always supplies a non-negative nanoseconds, so this simply restores the sub-second. The dispatcher and the after_unix_epoch path are untouched, so timezone composition, whole-second negatives, and post-1970 values are unchanged.

This helper is also shared by the ORC stripe min/max statistics decoder (orc_min_max_decoder.cpp), whose pre-1970 sub-second handling has separate, pre-existing quirks (the ORC nanos field is stored with a +1 offset the decoder does not undo; its truncating-division remainder can be negative; the before-epoch instant branch ignores the reader offset). To keep this change scoped to the data load path and leave predicate-pushdown bounds byte-for-byte unchanged, the decoder now explicitly drops the sub-second for negative-epoch bounds (its prior behavior) rather than letting the now-sub-second-preserving helper expose those quirks. Correctly decoding pre-1970 stripe-stats sub-second is left as a follow-up.

BE-only; FE and the pre-split pipeline are not touched.

Behavior change: No — values only, correcting a lossy result (the old output was a bug, not intended behavior); no SQL syntax / config / interface change. Same rationale as the companion Parquet fix #75207.

Tests

be/test/formats/orc/orc_chunk_reader_test.cppTestTimestampPreEpochSubSecond drives the real OrcChunkReader load path: pre-1970 sub-second on the plain (.500000, .123456) and instant (.500000) paths, plus whole-second-negative and post-1970 sub-second as no-regression guards. Verified RED→GREEN on a real build (before the fix the sub-second cases dropped to .000000 and the instant case dropped its fraction; after the fix all pass and the pre-existing TestTimestamp still passes). OrcMinMaxDecoderPreEpochTimestampDropsSubSecond covers the decoder guard: a pre-1970 stripe-stats bound decodes to its whole second (sub-second dropped, pruning unchanged). BE module-boundary check clean.

Companion to the merged Parquet fix #75207; together they cover pre-1970 sub-second temporal loads.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5

This is an automatic backport of pull request #75432 done by [Mergify](https://mergify.com).

)

Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 29ef34a)
@wanpengfei-git wanpengfei-git merged commit a3c42dc into branch-4.1 Jun 29, 2026
39 of 40 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-4.1/pr-75432 branch June 29, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants