Skip to content

Match failure with character class containing multiple Unicode codepoints #833

@ranvis

Description

@ranvis

Version: 10.48-DEV (2025-10-21) 3b91977
Built with cmake -DPCRE2_BUILD_PCRE2_32=ON on Windows

pcre2test -32

  re> /[\x{ff}\x{100}\x{8000}\x{8002}\x{8004}\x{8006}]/utf8_input
data> \x{100}
 0: \x{100}

  re> /[\x{ff}\x{100}\x{8000}\x{8002}\x{8004}\x{8006}\x{8008}]/utf8_input
data> \x{100}
No match

Expected behavior:
The second pattern should behave the same as the first, since the only difference
is the addition of another distinct codepoint \x{8008}.

Notes:
This issue is reproducible with PCRE2 10.47, 10.46 (called from PHP 8.5.0 RC3 offical Windows build) and 10.45
while 10.44 (called from PHP 8.4) is unaffected.

pcre2test -C
PCRE2 version 10.48-DEV 2025-10-21
Compiled with
  8-bit support
  32-bit support
  UTF and UCP support (Unicode version 16.0.0)
  No just-in-time compiler support
  Default newline sequence is LF
  \R matches all Unicode newlines
  \C is supported
  Internal link size
    Requested = 2
    Effective = 2
  Parentheses nest limit = 250
  Default heap limit = 20000000 kibibytes
  Default match limit = 10000000
  Default depth limit = 10000000
  pcre2test has libreadline support

Metadata

Metadata

Assignees

No one assigned

    Labels

    backportbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions