Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: make non-USV import strings a syntax error #3064

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

guybedford
Copy link

Just like export name strings must be valid well-formed unicode strings, this adds a SyntaxError for import specifier strings which are invalid unicode.

While this is a normative change, it is hard to imagine it being a breaking one. At the very least, opening for discussion - having module specifiers / module IDs as valid unicode seems a desirable property.

@bakkot
Copy link
Contributor

bakkot commented May 10, 2023

What's the motivation for this?

@guybedford
Copy link
Author

@bakkot Wasm modules only support USV string imports, so that in the ESM integration not all specifiers would be compatible at a strictly theoretical level.

@michaelficarra michaelficarra added normative change Affects behavior required to correctly evaluate some ECMAScript source text needs consensus This needs committee consensus before it can be eligible to be merged. labels May 10, 2023
@michaelficarra
Copy link
Member

I'm not against this, but that motivation doesn't make sense to me. @guybedford Can you explain in more detail or with examples?

@guybedford
Copy link
Author

@michaelficarra this is mostly a correctness issue over there being a tangible use case, although it's difficult to predict if there might be some tangible use case in future. Specifically, in WebAssembly modules, the import strings that are used to work the same like import 'mod' in the ESM integration are validated in WebAssembly modules to only be valid UTF strings. Thus, in theory Wasm imports are a subset of JS imports in this way. When @bmeck added string exports to get Wasm parity (export { a as "a" } kind of thing), this included a well-formed string validation step to match Wasm as well. So I was just investigating whether we should do the same for JS imports from a purity perspective.

@michaelficarra
Copy link
Member

michaelficarra commented May 11, 2023

@guybedford The string exports (and associated USV validation) makes sense because those two things (JS exports and wasm imports) interact. Wasm module specifiers and JS module specifiers don't interact. At least I don't understand how they do. So you're just trying to have their design match for purity's sake? That's pretty weak motivation.

Still, this artificial limitation doesn't seem like it would hurt anything, I just don't see how it might help anything.

@ljharb
Copy link
Member

ljharb commented May 11, 2023

Can you have a JS source code module that exports something under a non-well-formed name?

@bakkot
Copy link
Contributor

bakkot commented May 11, 2023

This isn't about the names of the exports - it's about module specifiers. At least if I understand the goal correctly.

(The answer to your question as stated is no, though.)

@bakkot
Copy link
Contributor

bakkot commented May 11, 2023

Also, if we do this we should also probably add a similar error for import().

Incidentally, right now (if I'm reading it right) HTML already requires that unpaired surrogates in module specifiers get replaced with U+FFFD (REPLACEMENT_CHARACTER) before hitting the network.

@guybedford
Copy link
Author

@bakkot it sounds like an alternative might also be to pass the string through toWellFormed then?

@bakkot
Copy link
Contributor

bakkot commented May 11, 2023

@guybedford Yup, though if we're going to change things I think I'd lean towards just making them illegal outright, as you propose; then we can update HTML so that it knows specifiers are already well-formed.

(HTML actually has a bug right now where it passes specifiers, which are not constrained to be well-formed, into an algorithm which is typed to only accept well-formed strings. Turns out I was not reading it right in my previous comment.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs consensus This needs committee consensus before it can be eligible to be merged. normative change Affects behavior required to correctly evaluate some ECMAScript source text
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants