Skip to content

fix: address ubsan findings#350

Open
zjw1111 wants to merge 6 commits into
alibaba:mainfrom
zjw1111:codex/ubsan-fixes
Open

fix: address ubsan findings#350
zjw1111 wants to merge 6 commits into
alibaba:mainfrom
zjw1111:codex/ubsan-fixes

Conversation

@zjw1111

@zjw1111 zjw1111 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Purpose

Linked issue: N/A

This PR fixes several undefined-behavior findings exposed after enabling sanitizer flags on the object library target. It also makes Hive bucket hashing use unsigned wraparound semantics at the interface level, matching the Java-style overflow behavior without relying on C++ signed overflow.

The changes include:

  • Apply sanitizer flags to paimon_objlib so implementation files are compiled with UBSan/ASan options.
  • Add sanitizer suppressions for known third-party Arrow timestamp boundary behavior and Lance/Tokio LSAN shutdown allocations.
  • Guard zero-length memory copies before pointer arithmetic.
  • Avoid undefined integer operations in decimal parsing, decimal tests, global index empty appends, and ORC timestamp null handling.
  • Switch Hive hashing APIs and accumulation to unsigned integer types.

Tests

API and Format

No public API, storage format, or protocol changes. The Hive hasher internal interface now uses unsigned hash value types.

Documentation

No user-facing documentation changes.

Generative AI tooling

Generated-by: Codex

Copilot AI review requested due to automatic review settings June 8, 2026 15:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comment on lines +42 to 46
static uint32_t HashBytes(const char* bytes, int32_t length) {
uint32_t result = 0;
for (int32_t i = 0; i < length; i++) {
result = (result * 31) + static_cast<int32_t>(bytes[i]);
result = result * 31U + static_cast<uint32_t>(static_cast<int32_t>(bytes[i]));
}
Comment thread cmake_modules/san-config.cmake Outdated
Comment on lines 28 to 31
target_compile_options(paimon_sanitizer_flags INTERFACE -fsanitize=undefined
-fno-sanitize=vptr
-fno-omit-frame-pointer)
target_link_options(paimon_sanitizer_flags INTERFACE -fsanitize=undefined)
Comment on lines +18 to +20
# Arrow's ISO8601 string-to-timestamp parser intentionally reaches the int64
# nanosecond boundary for values such as 1677-09-21 00:12:43.145224192.
signed-integer-overflow:std::chrono::__duration_cast_impl
Comment on lines +24 to +26
leak:tokio::runtime::blocking::pool::spawn_blocking
leak:tokio::runtime::scheduler::multi_thread::worker::create
leak:std::thread::Builder::spawn_unchecked
for (size_t i = 0; i < group; ++i) {
multiple *= 10;
if (length == 0) {
return Decimal::int128_t{0};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return bad status when "".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants