Skip to content

Implement proper surrogate pair handling in JavaWordFinder#2922

Draft
vogella wants to merge 1 commit into
eclipse-jdt:masterfrom
vogella:fix-java-word-finder
Draft

Implement proper surrogate pair handling in JavaWordFinder#2922
vogella wants to merge 1 commit into
eclipse-jdt:masterfrom
vogella:fix-java-word-finder

Conversation

@vogella
Copy link
Copy Markdown
Contributor

@vogella vogella commented Apr 2, 2026

PR #2977 removed the stale ICU comments from JavaWordFinder and replaced them with a simple surrogate skip (!Character.isSurrogate(c)). That avoids the old placeholder comments but still does not correctly handle supplementary Unicode characters in Java identifiers.

This PR replaces that shortcut with proper code-point logic: when a surrogate char is encountered during a word scan, the adjacent char is read to form the full code point, which is then tested with Character.isJavaIdentifierPart(int). Identifiers containing supplementary Unicode characters are now included in the word region correctly, and unpaired or non-identifier surrogates still terminate the scan.

@vogella vogella force-pushed the fix-java-word-finder branch from 4eab49f to b6d5ad7 Compare May 14, 2026 04:14
@vogella vogella changed the title Replace ICU-related comments with standard Java code in JavaWordFinder Implement proper surrogate pair handling in JavaWordFinder May 14, 2026
Replace the simplistic surrogate skip (from eclipse-jdt#2977) with full code-point
checking: when a surrogate char is encountered, form the code point from
the pair and test it with Character.isJavaIdentifierPart(int), so
identifiers containing supplementary Unicode characters are correctly
included in the word region.
@vogella vogella force-pushed the fix-java-word-finder branch from b6d5ad7 to 143821b Compare May 14, 2026 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant