[BugFix] Fix ORC min/max timestamp stats decode for pre-1970 and sub-second bounds (backport #75543) by mergify[bot] · Pull Request #75589 · StarRocks/starrocks

mergify · 2026-06-30T08:33:30Z

Why I'm doing:

The ORC stripe min/max statistics decoder produced TIMESTAMP pruning bounds that could exclude matching rows. For a negative-epoch (pre-1970) bound it dropped the sub-second entirely; it never undid the ORC nanos +1 serialization offset (so an absent maximum-nanos understated the upper bound by up to ~1 ms); it split the milliseconds with truncating division (a negative remainder for pre-1970 values); it ignored the reader timezone offset on the before-epoch instant branch; and it truncated nanoseconds to microseconds in both directions. Any of these can shrink [min, max] below the true value range, so predicate pushdown wrongly skips row groups/stripes and drops rows.

This is the stripe-stats (pruning) counterpart to the row-load sub-second fixes #75432 (ORC) and #75207 (Parquet); that ORC PR left a placeholder in this decoder with a TODO, addressed here.

What I'm doing:

Decode each bound so [min, max] stays a superset of the true value range:

undo the ORC nanos +1 offset, falling back to the conservative default when the field is absent or malformed (0 for the minimum, 999999 for the maximum) — fixes the understated max for millisecond-precision files;
floor the millisecond-to-second split toward -inf so the remainder (and nanoseconds) stay non-negative for pre-1970 values;
fold the instant timezone offset into the seconds and decode as plain UTC (fixes the before-epoch TIMESTAMP_INSTANT branch that dropped the offset);
round the minimum down and the maximum up to microsecond precision (StarRocks is microsecond-precision, ORC is nanosecond-precision).

The two inline blocks are replaced by one shared helper. The conversion helpers in utils.h and the row-load path are unchanged.

Pre-1970 TIMESTAMP_INSTANT bounds in a named zone whose historical offset differs from its epoch offset remain approximate under the existing scalar-offset model (the row-load path uses a per-instant cctz conversion); this is a pre-existing limitation, documented in the decoder, and strictly better than the prior behavior which dropped the offset entirely.

Unit tests (RED to GREEN) in starrocks_test cover: pre-1970 sub-second preservation with min-floor/max-ceil, the +1 undo and absent/malformed defaults, the instant offset fold (including crossing the Unix epoch), and post-1970 / whole-second-negative no-regression.

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
- This pr needs auto generate documentation
This is a backport pr

Bugfix cherry-pick branch check:

I have checked the version labels which the pr will be auto-backported to the target branch
- 4.1
- 4.0
- 3.5
  This is an automatic backport of pull request [BugFix] Fix ORC min/max timestamp stats decode for pre-1970 and sub-second bounds #75543 done by Mergify.

…second bounds (#75543) Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 13e7fdd)

[BugFix] Fix ORC min/max timestamp stats decode for pre-1970 and sub-…

bc10b04

…second bounds (#75543) Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 13e7fdd)

github-actions Bot assigned xiangguangyxg Jun 30, 2026

github-actions Bot added the automerge label Jun 30, 2026

wanpengfei-git enabled auto-merge (squash) June 30, 2026 08:34

mergify Bot mentioned this pull request Jun 30, 2026

[BugFix] Fix ORC min/max timestamp stats decode for pre-1970 and sub-second bounds #75543

Merged

23 tasks

xiangguangyxg approved these changes Jun 30, 2026

View reviewed changes

wanpengfei-git merged commit c37932d into branch-4.1 Jun 30, 2026
39 of 40 checks passed

wanpengfei-git deleted the mergify/bp/branch-4.1/pr-75543 branch June 30, 2026 09:14

github-actions Bot added the version:4.1.3 label Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix ORC min/max timestamp stats decode for pre-1970 and sub-second bounds (backport #75543)#75589

[BugFix] Fix ORC min/max timestamp stats decode for pre-1970 and sub-second bounds (backport #75543)#75589
wanpengfei-git merged 1 commit into
branch-4.1from
mergify/bp/branch-4.1/pr-75543

mergify Bot commented Jun 30, 2026 •

edited by wanpengfei-git

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mergify Bot commented Jun 30, 2026 • edited by wanpengfei-git Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mergify Bot commented Jun 30, 2026 •

edited by wanpengfei-git

Loading