Skip to content

fix: Like::TestString to align with Java LIKE semantics#320

Merged
lxy-9602 merged 6 commits into
alibaba:mainfrom
lxy-9602:fix-like
Jun 1, 2026
Merged

fix: Like::TestString to align with Java LIKE semantics#320
lxy-9602 merged 6 commits into
alibaba:mainfrom
lxy-9602:fix-like

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

Purpose

No Linked issue.

Description:

  • Validate LIKE escape sequences like Java Paimon
  • Treat trailing \ and invalid escapes as errors
  • Make _ match one UTF-8 character instead of one byte
  • Keep % matching any sequence
  • Remove alloca and use std::vector<bool> instead

Tests

  • Add invalid escape sequence tests
  • Add escaped \ and % tests
  • Add UTF-8 multi-byte _ matching tests
  • Add line terminator semantics tests

API and Format

Documentation

Generative AI tooling

Aone Copilot (Claude)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Like::TestString to match Java Paimon's LIKE semantics: strict escape validation, UTF-8 code-point-aware _ wildcard, and Java regex line-terminator handling. Also replaces the alloca-based DP buffer with a std::vector<bool>.

Changes:

  • Parse pattern in two phases: validate escapes (only \_, \%, \\ permitted; trailing \ and other escapes raise Status::Invalid) and merge consecutive %.
  • Decompose both pattern and field into UTF-8 code points so _ matches one character (not one byte) and rejects Java line terminators (\n, \r, U+0085, U+2028, U+2029); % still matches anything. Min-length quick reject now counts _ as a required char.
  • Replace alloca / unique_ptr<bool[]> DP storage with std::vector<bool>; add tests covering invalid escapes, escaped \/%, multibyte _, and line-terminator semantics.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/paimon/common/predicate/like.cpp Rewrites Like::TestString for Java-compatible escape validation, UTF-8 _ semantics, line-terminator handling, and std::vector-based DP.
src/paimon/common/predicate/predicate_test.cpp Adds four TEST_F cases for invalid escapes, escaped backslash/percent, UTF-8 multibyte _, and line-terminator semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/common/predicate/like.cpp Outdated
Copy link
Copy Markdown
Collaborator

@zjw1111 zjw1111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lxy-9602 lxy-9602 merged commit 3830531 into alibaba:main Jun 1, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants