Allow more characters when creating various nodes #1079

domenic · 2022-05-05T19:53:39Z

Stop using XML grammar productions for validating element, attribute, and doctype names. These were overly-restrictive, as it is possible to create nodes with names that don't match those productions using the HTML parser. After this change, each construct has its own custom name validation algorithm.

The only remaining dependency on XML naming rules is for processing instructions, which are uncommon and cannot be created with the HTML parser anyway.

Closes #769. Closes #849. Closes #1373.

At least two implementers are interested (and none opposed):
- Gecko
- Chromium
Tests are written and can be reviewed and commented upon at:
- Allow more characters in element/attribute names and prefixes web-platform-tests/wpt#38503
- https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Implementation bugs are filed:
- Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=1334640
- Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1773312
- Safari: https://bugs.webkit.org/show_bug.cgi?id=241419

(See WHATWG Working Mode: Changes for more details.)

Original points for discussion, discussed and concluded on in following comments. These do not reflect the currently proposed change.

I did not disallow = inside attribute local names. Both the parser and DOM APIs currently disallow them, except the parser allows it for the first character. I'm happy to change this if people prefer; I started with the simpler version.
This does not disallow lone surrogates, the Unicode replacement character U+FFFD, single quotes, or < in any position, because the HTML parser allows introducing those already and it seems nicer to align.
I did not change validation for createProcessingInstruction() or createDocumentType(). We could try to simplify those too, perhaps after investigating parser behavior. But they didn't seem to be causing any real web developer pain, unlike elements and local names, so I thought it'd be better to just leave them as-is.

Preview | Diff

annevk

Do we want to call out the = difference in a note? At least a comment would be good I think.

I'm not a big fan of adding "DOM API" to the naming. That makes more sense if this was defined outside of the DOM Standard itself. I think dropping it would still make everything work.

dom.bs

annevk

Thanks, this looks good to me, modulo a small oversight. Anyone else that should review this? @mfreed7 perhaps?

dom.bs

mfreed7

Looks good!

dom.bs

domenic · 2022-06-02T16:11:08Z

Discussed equals sign at HTML triage meeting. Conclusion: disallow it in attributes everywhere. (Even though the parser allows it in the first-character position.)

domenic · 2022-06-06T21:29:07Z

I think this is ready for re-review.

Potential issue: XML's definition of Char seems nonsensical (it excludes various Unicode characters below U+0020). And, its definition of the [^#x00#x09#x0A#0x0Cx0D#x20/>] syntax depends on that definition. Hmm.

domenic · 2022-06-07T15:56:55Z

Refined to no longer use EBNF.

This follows whatwg/dom#1079.

annevk

This does not seem equivalent to the sorta-EBNF from before. In particular if the first code point is from BeyondHTMLParserName the second code point was more limited.

dom.bs

domenic · 2022-06-07T16:31:14Z

In particular if the first code point is from BeyondHTMLParserName the second code point was more limited.

I'm not sure exactly what you mean. Recall that it's a union of both. The second+ code point is from HTMLParserCompatibleName, which had [^#x00#x09#x0A#0x0Cx0D#x20/>]* for that position. Which is exactly what the current draft says, right?

annevk · 2022-06-07T17:00:15Z

I don't think the EBNF allows for the second code point to be U+0001 when the first is :, for instance. At least the intent was to prevent that. Does EBNF work completely differently from ABNF in that | doesn't signify OR but instead "union"?

(I didn't see "An equivalent EBNF is the following" initially and I don't think what it states is correct.)

domenic · 2022-06-07T17:05:13Z

I see, I did not capture that this was a branching scenario depending on the behavior of the first code point. And you addressed what harms names like that might hypothetically cause in #849 (comment) .

I'll revise.

domenic · 2022-06-07T17:26:58Z

I think that is done. The other way I could write this is by looping over the characters individually, which is what a performant implementation would do (instead of using lots of O(n) "contains" operations). But I think this is relatively clear.

(Edit: well, a performant implementation would be looping over code units, since that's JS's native string format... which feels ickier to spec.)

annevk

Thanks, this looks accurate to me.

dom.bs

domenic · 2022-06-08T15:40:19Z

OK, this (and whatwg/html#7991) is just waiting on someone to write web platform tests. Then we can close a ~5 year old recurring pain point on the web platform!

For fun, these are all the references to this I can find:

I suspect there are more GitHub issues from earlier, because why would I have posted #449 if not because of some other issue someone filed? But I couldn't find them.

annevk · 2023-02-14T12:43:48Z

@josepharhar would you be interested in finishing this?

josepharhar · 2023-02-15T00:29:02Z

Yes, I have started a WPT here: web-platform-tests/wpt#38503

annevk · 2023-02-15T13:35:52Z

\o/ I suspect that once you implement this and do a try run you'll find a lot of existing WPT tests that can be adjusted. There's probably no need for a new file, but maybe.

cdumez · 2023-05-05T19:48:17Z

Any progress on this?

josepharhar · 2023-05-08T17:25:29Z

Not recently, I have unfortunately been focused on other stuff.

This patch significantly changes the parsing of element names, attribute names, and namespace prefixes for DOM APIs to allow more flexibility and better parity with the HTML parser. I am planning on making an intent to ship for this behavior before enabling it by default. I am planning on making another WPT patch to change the existing tests to match the new parsing behavior once the spec is merged and the I2S is complete, and maybe also after the new behavior reaches stable with no issues. Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683 Commit-Queue: Joey Arhar <[email protected]> Reviewed-by: Mason Freed <[email protected]> Cr-Commit-Position: refs/heads/main@{#1468337}

josepharhar · 2025-06-02T23:34:27Z

Here is a new WPT: web-platform-tests/wpt#38503

And here is another patch in progress to update existing WPTs, which doesn't have a WPT PR yet because we stopped generating those until the code review gets approved: https://chromium-review.googlesource.com/c/chromium/src/+/6570951

This follows whatwg/dom#1079.

Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951 Reviewed-by: David Baron <[email protected]> Commit-Queue: Joey Arhar <[email protected]> Cr-Commit-Position: refs/heads/main@{#1475652}

…te names and prefixes, a=testonly Automatic update from web-platform-tests Allow more characters in element/attribute names and prefixes This patch significantly changes the parsing of element names, attribute names, and namespace prefixes for DOM APIs to allow more flexibility and better parity with the HTML parser. I am planning on making an intent to ship for this behavior before enabling it by default. I am planning on making another WPT patch to change the existing tests to match the new parsing behavior once the spec is merged and the I2S is complete, and maybe also after the new behavior reaches stable with no issues. Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683 Commit-Queue: Joey Arhar <[email protected]> Reviewed-by: Mason Freed <[email protected]> Cr-Commit-Position: refs/heads/main@{#1468337} -- wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941 wpt-pr: 38503

…ute parsing, a=testonly Automatic update from web-platform-tests Update tests for relaxed DOM name/attribute parsing Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951 Reviewed-by: David Baron <[email protected]> Commit-Queue: Joey Arhar <[email protected]> Cr-Commit-Position: refs/heads/main@{#1475652} -- wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d wpt-pr: 53251

…te names and prefixes, a=testonly Automatic update from web-platform-tests Allow more characters in element/attribute names and prefixes This patch significantly changes the parsing of element names, attribute names, and namespace prefixes for DOM APIs to allow more flexibility and better parity with the HTML parser. I am planning on making an intent to ship for this behavior before enabling it by default. I am planning on making another WPT patch to change the existing tests to match the new parsing behavior once the spec is merged and the I2S is complete, and maybe also after the new behavior reaches stable with no issues. Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683 Commit-Queue: Joey Arhar <jarharchromium.org> Reviewed-by: Mason Freed <masonfchromium.org> Cr-Commit-Position: refs/heads/main{#1468337} -- wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941 wpt-pr: 38503 UltraBlame original commit: d481425a1a74ac2f7daeada04b500b398ca4d8ca

…ute parsing, a=testonly Automatic update from web-platform-tests Update tests for relaxed DOM name/attribute parsing Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951 Reviewed-by: David Baron <dbaronchromium.org> Commit-Queue: Joey Arhar <jarharchromium.org> Cr-Commit-Position: refs/heads/main{#1475652} -- wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d wpt-pr: 53251 UltraBlame original commit: 405174975ec2014a5185834e62a95ea5959be20f

…te names and prefixes, a=testonly Automatic update from web-platform-tests Allow more characters in element/attribute names and prefixes This patch significantly changes the parsing of element names, attribute names, and namespace prefixes for DOM APIs to allow more flexibility and better parity with the HTML parser. I am planning on making an intent to ship for this behavior before enabling it by default. I am planning on making another WPT patch to change the existing tests to match the new parsing behavior once the spec is merged and the I2S is complete, and maybe also after the new behavior reaches stable with no issues. Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683 Commit-Queue: Joey Arhar <jarharchromium.org> Reviewed-by: Mason Freed <masonfchromium.org> Cr-Commit-Position: refs/heads/main{#1468337} -- wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941 wpt-pr: 38503 UltraBlame original commit: d481425a1a74ac2f7daeada04b500b398ca4d8ca

…ute parsing, a=testonly Automatic update from web-platform-tests Update tests for relaxed DOM name/attribute parsing Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951 Reviewed-by: David Baron <dbaronchromium.org> Commit-Queue: Joey Arhar <jarharchromium.org> Cr-Commit-Position: refs/heads/main{#1475652} -- wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d wpt-pr: 53251 UltraBlame original commit: 405174975ec2014a5185834e62a95ea5959be20f

…te names and prefixes, a=testonly Automatic update from web-platform-tests Allow more characters in element/attribute names and prefixes This patch significantly changes the parsing of element names, attribute names, and namespace prefixes for DOM APIs to allow more flexibility and better parity with the HTML parser. I am planning on making an intent to ship for this behavior before enabling it by default. I am planning on making another WPT patch to change the existing tests to match the new parsing behavior once the spec is merged and the I2S is complete, and maybe also after the new behavior reaches stable with no issues. Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683 Commit-Queue: Joey Arhar <jarharchromium.org> Reviewed-by: Mason Freed <masonfchromium.org> Cr-Commit-Position: refs/heads/main{#1468337} -- wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941 wpt-pr: 38503 UltraBlame original commit: d481425a1a74ac2f7daeada04b500b398ca4d8ca

…ute parsing, a=testonly Automatic update from web-platform-tests Update tests for relaxed DOM name/attribute parsing Spec PR: whatwg/dom#1079 Bug: 40122442, 40228234 Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951 Reviewed-by: David Baron <dbaronchromium.org> Commit-Queue: Joey Arhar <jarharchromium.org> Cr-Commit-Position: refs/heads/main{#1475652} -- wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d wpt-pr: 53251 UltraBlame original commit: 405174975ec2014a5185834e62a95ea5959be20f

…37747) A recent update in the spec (whatwg/dom#1079) introduced new rules for name validation of attribute, element, and doctype. This PR implements the new name validation rules in `components/script/dom/bindings/domname.rs`. The old XML name validation rules are not fully removed because there remains a few usage of it in `ProcessingInstructions` and `xpath`. Testing: Covered by WPT tests Fixes: #37746 --------- Signed-off-by: minghuaw <[email protected]> Signed-off-by: Minghua Wu <[email protected]> Co-authored-by: Xiaocheng Hu <[email protected]>

annevk reviewed May 10, 2022

View reviewed changes

dom.bs Outdated Show resolved Hide resolved

annevk reviewed May 17, 2022

View reviewed changes

dom.bs Outdated Show resolved Hide resolved

domenic requested a review from mfreed7 May 18, 2022 16:47

mfreed7 approved these changes May 26, 2022

View reviewed changes

dom.bs Outdated Show resolved Hide resolved

dom.bs Outdated Show resolved Hide resolved

past mentioned this pull request Jun 2, 2022

Upcoming HTML standard issue triage meeting on 6/2/2022 whatwg/html#7919

Closed

whatwg deleted a comment from Maufas Jun 6, 2022

domenic added a commit to whatwg/html that referenced this pull request Jun 7, 2022

Allow more characters in custom element names

60a89f0

This follows whatwg/dom#1079.

annevk reviewed Jun 7, 2022

View reviewed changes

dom.bs Outdated Show resolved Hide resolved

dom.bs Outdated Show resolved Hide resolved

domenic mentioned this pull request Jun 7, 2022

Allow more characters in custom element names whatwg/html#7991

Merged

3 tasks

annevk approved these changes Jun 8, 2022

View reviewed changes

dom.bs Outdated Show resolved Hide resolved

mathiasbynens approved these changes Jun 8, 2022

View reviewed changes

zcorpan added the needs tests Moving the issue forward requires someone to write tests label Jun 8, 2022

annevk mentioned this pull request Jun 13, 2022

Whitespace versus colon as a namespace separator WICG/sanitizer-api#146

Closed

annevk mentioned this pull request Aug 23, 2022

Allow more characters in dataset? whatwg/html#8215

Open

annevk mentioned this pull request Oct 4, 2024

[css-syntax] Missing emoji in non-ascii identifier codepoints w3c/csswg-drafts#11005

Open

domenic mentioned this pull request May 14, 2025

Valid/Invalid characters in document.createElement() #849

Closed

josepharhar mentioned this pull request May 14, 2025

Should vertical tab character be included in ASCII whitespace? whatwg/infra#670

Open

josepharhar approved these changes Jun 2, 2025

View reviewed changes

domenic removed the needs tests Moving the issue forward requires someone to write tests label Jun 3, 2025

domenic merged commit e67d5fe into main Jun 9, 2025
2 checks passed

domenic deleted the createelement-chars branch June 9, 2025 04:47

domenic added a commit to whatwg/html that referenced this pull request Jun 9, 2025

Allow more characters in custom element names

78d2678

This follows whatwg/dom#1079.

chromium-wpt-export-bot mentioned this pull request Jun 18, 2025

Update tests for relaxed DOM name/attribute parsing web-platform-tests/wpt#53251

Merged

This was referenced Jun 27, 2025

Update validation of element, attribute, and doctype names servo/servo#37746

Closed

script: Update name validation for attribute, element, and doctype servo/servo#37747

Merged

domenic mentioned this pull request Jul 8, 2025

Remove XML name validation from dataset named property setter whatwg/html#11439

Closed

This was referenced Jul 15, 2025

Remove more XML-derived attribute name validation whatwg/html#11453

Merged

HTML element, attribute, and doctype name validation rules are changing mdn/content#40366

Open

Allow more characters when creating various nodes #1079

Allow more characters when creating various nodes #1079

Uh oh!

Conversation

domenic commented May 5, 2022 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

annevk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

annevk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mfreed7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

domenic commented Jun 2, 2022

Uh oh!

domenic commented Jun 6, 2022

Uh oh!

domenic commented Jun 7, 2022

Uh oh!

annevk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

domenic commented Jun 7, 2022

Uh oh!

annevk commented Jun 7, 2022

Uh oh!

domenic commented Jun 7, 2022

Uh oh!

domenic commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

annevk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

domenic commented Jun 8, 2022

Uh oh!

annevk commented Feb 14, 2023

Uh oh!

josepharhar commented Feb 15, 2023

Uh oh!

annevk commented Feb 15, 2023

Uh oh!

cdumez commented May 5, 2023

Uh oh!

josepharhar commented May 8, 2023

Uh oh!

josepharhar commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

domenic commented May 5, 2022 •

edited by pr-preview bot

Loading

domenic commented Jun 7, 2022 •

edited

Loading