-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: nested surrogate pairs? #969
Comments
I agree the current paragraph is oddly phrased. I don't understand why it's written like that. Maybe @NorbertLindenberg or @allenwb remember? If we're gonna change this, your first suggested approach seems simplest IMHO:
|
Note that PR #1727 also addresses this issue. |
I didn’t write this – it first showed up in the spec draft of 2015-04-03. The change was preceded by a discussion on “lonely surrogates and unicode regexps” starting here: |
I wrote it because without it (or something like it) the grammar is ambiguous. The reason the phrasing parallels the language used for "dangling else" clause is because at the grammar engineering level this is the same kind of ambiguity and copying that language seemed like a safe choice to make given this was happening in April 2015 immediately before release of the final ES6 draft. There wasn't a lot of time to study alternatives. Note that the disambiguation really only concerns the first three alternative of
because that is where the ambiguity exists. The disambiguation statement does not lead to recognizing "nested surrogate pairs" in situations such as such as:
So, no "nesting" is recognized. But, I agree that the current wording of the disambiguation statement isn't ideal for this situation. My suggestion for a replacement (I excluded the grammar parameters but the actual spec text should include them): For ambiguous inputs where either the rule: |
There's another problem with that paragraph: it makes it sound like there can be any number of non-surrogate, and even non- |
Attached is the email thread that led to the addition of the paragraph in question. |
In the 'Patterns' clause, the Syntax section contains the paragraph:
This strikes me as very odd. The wording is exactly parallel to that for resolving the dangling else problem, suggesting that surrogate pairs nest somehow. I.e., given
the last
TrailSurrogate
should be 'associated' with the firstLeadSurrogate
. But what would that even mean? (Note that the relevant semantics make no mention of "associated" or "corresponding" Surrogates.)Now, granted, the grammar is formally ambiguous here (
Alternative[+U]
derives\u LeadSurrogate \u TrailSurrogate
in two distinct ways), and so requires some disambiguation. But rather than the quoted paragraph, I'd prefer one of these approaches:\u LeadSurrogate \u TrailSurrogate
, it must be parsed as a singleAtom
rather than two.[+U] u LeadSurrogate [lookahead != \u TrailSurrogate]
(One objection to the last approach is that
TrailSurrogate
is not a terminal symbol, and so this is not a 'legal' lookahead sequence. However, this doesn't bother me much: we could enlarge the definition of lookahead sequences if we wanted.)The text was updated successfully, but these errors were encountered: