Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scanning issues related to Unicode escapes in identifiers #61042

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

graphemecluster
Copy link
Contributor

@graphemecluster graphemecluster commented Jan 24, 2025

This PR fixes several issues related to identifiers scanning:

For regular expression group name

For all identifiers

- Added the `scanIdentifierStart` method
- Refactored `scanIdentifierParts` & `scanIdentifier`

Implementing the above automatically fixes another issue that Unicode
escapes & extended Unicode escapes are not recognised at the beginning
of a RegExp group name.
@typescript-bot typescript-bot added the For Uncommitted Bug PR for untriaged, rejected, closed or missing bug label Jan 24, 2025
Copy link
Contributor Author

@graphemecluster graphemecluster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to leave out the last commit when reviewing for a better diff.

// "-" and ":" are valid in JSX Identifiers
(identifierVariant === LanguageVariant.JSX ? (ch === CharacterCodes.minus || ch === CharacterCodes.colon) : false) ||
// "-" is valid in JSX Identifiers. ":" is part of JSXNamespacedName but not JSXIdentifier.
identifierVariant === LanguageVariant.JSX && ch === CharacterCodes.minus ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this line doesn't affect everything in the test suite, but I am not sure if it affects code completion.

ch = peekExtendedUnicodeEscape();
if (ch >= 0 && isIdentifierPart(ch, languageVersion)) {
if (ch >= 0 && isIdentifierPart(ch, languageVersion, identifierVariant)) {
result += text.substring(start, pos);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the missing line that causes #61043.

pos += charSize(ch);
return token; // Still `SyntaxKind.Unknown`
tokenFlags = TokenFlags.None;
return scanIdentifier(ScriptTarget.ESNext);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and it does not affect anything in the test suite even if pos is not advanced.

return token = SyntaxKind.Identifier;
}

function scanIdentifier(languageVersion: ScriptTarget, identifierVariant?: LanguageVariant | "RegExpGroupName") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, either an internal value can be added into LanguageVariant, or an enum called IdentifierVariant can be created, but for the latter case, I am not sure if isIdentifierPart and isIdentifierText should also be amended since it might be breaky, at least at type level.

Comment on lines +277 to +281
/(?<\u{D800}\u{DC00}>)\k<\u{D800}\u{DC00}>/;

!!! error TS1514: Expected a capturing group name.
~~~~~~~~
!!! error TS1538: Unicode escape sequences are only available when the Unicode (u) flag or the Unicode Sets (v) flag is set.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first escape sequence error is not reported as it starts at the same position as the TS1514 error. Is it really desirable? Personally, I would prefer to have an additional check that the last error is not of length zero, in which case it is not considered to be overlapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
For Uncommitted Bug PR for untriaged, rejected, closed or missing bug
Projects
Status: Not started
Development

Successfully merging this pull request may close these issues.

Text between Unicode escapes within an identifier is skipped
2 participants