Skip to content

Fix scanning issues related to Unicode escapes in identifiers#61042

Closed
graphemecluster wants to merge 4 commits intomicrosoft:mainfrom
graphemecluster:scan-identifiers-with-escapes
Closed

Fix scanning issues related to Unicode escapes in identifiers#61042
graphemecluster wants to merge 4 commits intomicrosoft:mainfrom
graphemecluster:scan-identifiers-with-escapes

Conversation

@graphemecluster
Copy link
Copy Markdown
Contributor

@graphemecluster graphemecluster commented Jan 24, 2025

This PR fixes several issues related to identifiers scanning:

For regular expression group name

For all identifiers

- Added the `scanIdentifierStart` method
- Refactored `scanIdentifierParts` & `scanIdentifier`

Implementing the above automatically fixes another issue that Unicode
escapes & extended Unicode escapes are not recognised at the beginning
of a RegExp group name.
@typescript-bot typescript-bot added the For Uncommitted Bug PR for untriaged, rejected, closed or missing bug label Jan 24, 2025
Copy link
Copy Markdown
Contributor Author

@graphemecluster graphemecluster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to leave out the last commit when reviewing for a better diff.

Comment thread src/compiler/scanner.ts
// "-" and ":" are valid in JSX Identifiers
(identifierVariant === LanguageVariant.JSX ? (ch === CharacterCodes.minus || ch === CharacterCodes.colon) : false) ||
// "-" is valid in JSX Identifiers. ":" is part of JSXNamespacedName but not JSXIdentifier.
identifierVariant === LanguageVariant.JSX && ch === CharacterCodes.minus ||
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this line doesn't affect everything in the test suite, but I am not sure if it affects code completion.

Comment thread src/compiler/scanner.ts Outdated
ch = peekExtendedUnicodeEscape();
if (ch >= 0 && isIdentifierPart(ch, languageVersion)) {
if (ch >= 0 && isIdentifierPart(ch, languageVersion, identifierVariant)) {
result += text.substring(start, pos);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the missing line that causes #61043.

Comment thread src/compiler/scanner.ts
pos += charSize(ch);
return token; // Still `SyntaxKind.Unknown`
tokenFlags = TokenFlags.None;
return scanIdentifier(ScriptTarget.ESNext);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and it does not affect anything in the test suite even if pos is not advanced.

Comment thread src/compiler/scanner.ts
return token = SyntaxKind.Identifier;
}

function scanIdentifier(languageVersion: ScriptTarget, identifierVariant?: LanguageVariant | "RegExpGroupName") {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, either an internal value can be added into LanguageVariant, or an enum called IdentifierVariant can be created, but for the latter case, I am not sure if isIdentifierPart and isIdentifierText should also be amended since it might be breaky, at least at type level.

Comment on lines +277 to +281
/(?<\u{D800}\u{DC00}>)\k<\u{D800}\u{DC00}>/;

!!! error TS1514: Expected a capturing group name.
~~~~~~~~
!!! error TS1538: Unicode escape sequences are only available when the Unicode (u) flag or the Unicode Sets (v) flag is set.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first escape sequence error is not reported as it starts at the same position as the TS1514 error. Is it really desirable? Personally, I would prefer to have an additional check that the last error is not of length zero, in which case it is not considered to be overlapping.

@graphemecluster
Copy link
Copy Markdown
Contributor Author

graphemecluster commented Nov 4, 2025

@RyanCavanaugh I understand that your Team is busy working on Corsa, but since you just pinged me regarding another regex issue, would it be possible for you to take some time to have a look at this pull request too?

Edit: Due to Ron’s unfortunate situation, please allow me to bring this to the attention of an extra person. @jakebailey?

@jakebailey
Copy link
Copy Markdown
Member

I intend to review this, yes, though I will note that whatever we add here is going to have to get ported to the new Go compiler so hopefully it's not too complicated or tied to JS strings

@jakebailey jakebailey requested review from jakebailey and removed request for rbuckton November 17, 2025 23:14
@typescript-bot typescript-bot added For Backlog Bug PRs that fix a backlog bug and removed For Uncommitted Bug PR for untriaged, rejected, closed or missing bug labels Nov 17, 2025
@github-project-automation github-project-automation bot moved this from Not started to Done in PR Backlog Mar 24, 2026
@typescript-bot
Copy link
Copy Markdown
Collaborator

With 6.0 out as the final release vehicle for this codebase, we're closing all PRs that don't fit the merge criteria for post-6.0 patches. If you think this was a mistake and this PR fits the post-6.0 patch criteria, please post to the 6.0 iteration issue with details (specifically, which PR and which patch criteria it satisfies).

Next steps for PRs:

  • For crash bugfixes or language service improvements, PRs are currently accepted at the typescript-go repo
  • Changes to type system behavior should wait until after 7.0, at which point mainline TypeScript development will resume in this repository with the Go codebase
  • Library file updates (lib.d.ts etc) continue to live in this repo or the DOM Generator repo as appropriate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

For Backlog Bug PRs that fix a backlog bug

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Text between Unicode escapes within an identifier is skipped

4 participants