Split core definitions out into separate index spaces #29

lukewagner · 2022-05-02T18:58:04Z

This PR addresses #11 as proposed in that issue. The split ended up touching a bunch of the AST/text/binary and even the basic terminology used so the PR ended up being fairly extensive and requiring a few tries to get something that feels right. But ultimately, all the changes here should be superficial; the essential concepts should be the same.

As part of this change, a few simplifications and better symmetries (between core and component layers) presented themselves which are included in the PR:

Instead of enumerating all the different definitions in various places, the PR defines sort and core:sort which enumerate them in one place so that they can be reused. A "sort" is what you get when you consider that (func) and (import (func)) are literally different kinds of definitions but both go into the same index space. So "sort" is 1:1 with index space, and we can say that a definition has a "sort".
"Interface types" and intertype increasingly seem like the wrong names for the type of "first class values used by components" when juxtaposed with core:valtype. Moreover, the "interface" of a component isn't just defined by value types, but also the other types too (func, module, component, instance). Thus, this PR replaces intertype with valtype, drops the word "interface" when referring to values, value types, functions, function types and instead qualifies these words with "core" and "component" when needed.
deftype was changed to align with the GC proposal which had a slightly different meaning and context (previously, "deftype" meant "definition type" (i.e., the type of second-class definitions)), but now it means "defined typed" (i.e., things you can stick in a (type ...) definition, which is subtly different).
Type imports/exports better align with the existing core type imports/exports proposal by wrapping the exact-type bounds in (eq ...).
The production names ending in -def (that appeared inside module, instance and component types) were renamed to end in decl ("declarator") because, in my experience, the hyphen is hard to use in communication. "Declarator" isn't exactly a perfect name, but at least it's a single word that we can give a clear meaning in this spec).
The kind field of the binary format preamble was renamed to layer because "kind" seems overloaded.

There were also a few bug fixes that I found when rereading everything. In particular, I see before I was a bit sloppy with abusing the syntactic sugar, so a few examples got an extra (func ...) or (type ...) wrapped around the identifier.

lukewagner · 2022-05-03T18:12:53Z

(For anyone who eagerly looked at the PR yesterday, the second commit fixes the binary encoding of valtype to match how it worked with intertype, and also refactors the type grammar a bit.)

design/mvp/Explainer.md

lukewagner · 2022-05-10T20:26:57Z

This most recent update removes the hand-waving in type grammar since @pl-semiotics showed it was ambiguous and simply wrong in a few cases. The type grammar now uses explicit type indices (calling out when they need (type <typeidx>)/(type $id) vs when they're just <typeidx>/$id) and inline type expressions that should more-faithfully capture the actual proposed text format here. It also lines up more clearly with the binary format, which is nice. @pl-semiotics LMKWYT

design/mvp/Binary.md

rossberg

Great, I think this is cleaner and clearer than before!

design/mvp/Explainer.md

rossberg · 2022-05-11T11:12:57Z

design/mvp/Explainer.md

+The `func` type constructor describes a component-level function definition
+that takes and returns `valtype`. In contrast to [`core:functype`] which, as a
+low-level compiler target for a stack machine, returns zero or more results,
+`functype` always returns a single type, with `unit` being used for functions


I'm still a bit saddened by this aymmetry. If languages can bind to tuple types, shouldn't they likewise be able to bind to functions with multiple results?

I think the asymmetry is unsurprising when you consider that one is meant to be "source-level" and one is "assembly-level". Also, from a source-level perspective, multi-return opens up a number of questions that just aren't an issue with the current design: can the multi-return values be named (for symmetry with params, where we do really want the names), but then are the language bindings special-casing the unnamed-single-value case to just return the value, which is a bit irregular, and are some toolchains adding names just for the documentative value but now the language bindings are forced to wrap the returned scalar at runtime and adding overhead (which isn't an issue for named params).

I think the asymmetry is unsurprising when you consider that one is meant to be "source-level" and one is "assembly-level".

That's insinuating that all source languages have single return values, which of course isn't the case. One could argue that component types use a smallest common denominator approach, but that's not true either, since they feature things like variants etc.

I understand what you mean regarding bindings and names, but is that really an asymmetry? If you simply allowed the names, languages who have no use for named return values could simply ignore the names and handle them positionally. The same is actually true for parameters: there are languages where parameter names are relevant at call sites (e.g., named parameters can be used to reorder the argument list), but most languages just do calls positionally and parameter names from the types are irrelevant beyond documentation value.

One could argue that component types use a smallest common denominator approach, but that's not true either, since they feature things like variants etc.

I know, it's a fuzzy least common denominator, but in the particular case of return values, single-return (with good tuple integration) is overwhelmingly more common.

f you simply allowed the names, languages who have no use for named return values could simply ignore the names and handle them positionally. The same is actually true for parameters:

I think the asymmetry arises due to the producing nature of return types vs. the consuming nature of param types. The language binding for params can easily consume either named or positional parameters, but the language binding for results must pick one, which has consequences either way. In particular, all languages will have some way to represent named return values (e.g., as a dictionary) which will have some fixed overhead vs. the unnamed scalar binding, so the language bindings will have to just pick something, biasing one way or the other. With single-return, there would be a clear directive for language bindings to produce a name-bearing value for record results (and of course unnamed for scalar or tuple results), allowing interface authors to more-reliably say what they want.

A similar choice is also necessary for parameters sometimes: there are a number of languages (especially functional ones) that only have single parameter functions. Multiple parameters are then represented either with tuples or by currying. These are two mutually exclusive choices, and a mapping for such a language will have to pick one over the other.

Likewise, there are languages where named vs positional parameter definitions is a mutually exclusive choice.

So I wouldn't count either of this as a qualitative difference.

design/mvp/Explainer.md

pl-semiotics

I think this makes a lot more sense! I'll make the changes to the interpreter parser and make sure that the examples all still parse, but this version of the type grammar looks much cleaner to me :) (and the fact that it lines up much more nicely with the binary format/AST is very nice!)

design/mvp/Explainer.md

design/mvp/Binary.md

lukewagner · 2022-05-24T14:52:10Z

Excellent feedback and suggestions so far; thanks all! Things seem to have quieted down, so I'll hold until the end of the week and merge if it's still quiet. (We can keep talking about the pre-existing (relative to this PR) single-vs-multi-return issue separately.)

pl-semiotics · 2022-05-24T17:19:44Z

design/mvp/Explainer.md

+Canonical definitions specify one of these two wrapping directions, the function
+to wrap and a list of configuration options:
+```
+canon    ::= (canon lift core-prefix(<core:funcidx>) <functype> <canonopt>* (func <id>?))


Should this <functype> be also allowed to be a <typeidx>?

Oh wow, good catch! So, in updating <functype> to follow the same explicit "inline type or (type )" scheme as elsewhere, I realized that we could make canon lift a lot more symmetric with imports by having the function type go with the sort and id. So this change moves around the functype (and allows type indices). If you look at the corresponding changes in the examples, it seems clear this is what it should've been all along.

design/mvp/Binary.md

…the examples

Co-authored-by: Peter Huene <[email protected]>

lukewagner · 2022-05-27T19:37:34Z

Ok, merging this, and we can continue to iterate by filing new issues. Next up, future and stream :)

pl-semiotics reviewed May 3, 2022

View reviewed changes

design/mvp/Explainer.md Show resolved Hide resolved

design/mvp/Explainer.md Outdated Show resolved Hide resolved

pl-semiotics reviewed May 5, 2022

View reviewed changes

design/mvp/Explainer.md Outdated Show resolved Hide resolved

pl-semiotics reviewed May 5, 2022

View reviewed changes

design/mvp/Explainer.md Outdated Show resolved Hide resolved

design/mvp/Explainer.md Show resolved Hide resolved

design/mvp/Explainer.md Outdated Show resolved Hide resolved

fitzgen mentioned this pull request May 5, 2022

[meta] Support for the component model bytecodealliance/wasm-tools#450

Closed

14 tasks

peterhuene reviewed May 10, 2022

View reviewed changes

design/mvp/Binary.md Outdated Show resolved Hide resolved

rossberg reviewed May 11, 2022

View reviewed changes

pl-semiotics reviewed May 11, 2022

View reviewed changes

This was referenced May 12, 2022

components: Text format does not disambiguate function alias binary encodings bytecodealliance/wasm-tools#588

Closed

components: Text format describes import/export types with information not present in binary bytecodealliance/wasm-tools#590

Closed

peterhuene reviewed May 19, 2022

View reviewed changes

design/mvp/Binary.md Show resolved Hide resolved

peterhuene reviewed May 19, 2022

View reviewed changes

design/mvp/Binary.md Outdated Show resolved Hide resolved

peterhuene reviewed May 19, 2022

View reviewed changes

design/mvp/Binary.md Outdated Show resolved Hide resolved

peterhuene mentioned this pull request May 20, 2022

A few small wasm-smith index space refactorings bytecodealliance/wasm-tools#617

Merged

peterhuene reviewed May 20, 2022

View reviewed changes

design/mvp/Binary.md Outdated Show resolved Hide resolved

pl-semiotics reviewed May 24, 2022

View reviewed changes

peterhuene reviewed May 24, 2022

View reviewed changes

design/mvp/Binary.md Outdated Show resolved Hide resolved

lukewagner added 11 commits May 26, 2022 17:19

Split core definitions out into separate index spaces

85873a6

Restore value type binary encoding, refactor type grammar slightly

6e78729

Remove 'outer' option from core:alias

24975fc

Tweak <sort> grammar to be more regular

67608ed

Fix whitespace

14ae2d0

Fix bug in outer alias example

0d99c78

Fix bug in outer alias example (better)

1891708

Fix thinko in definition of 'sort'

9e51096

... and in Binary.md too

a6e40d1

Remove ambiguous hand-waving from type grammar

fce98d2

Add better validation notes in Binary.md, normalize on 'externdesc'

9d50001

lukewagner and others added 16 commits May 26, 2022 17:21

s/varu32/u32/ because to match actual core wasm spec

45f433b

Clamp down core (with ...) expressions to just the 'instance' sort

2e71676

Remove dangling <id>?

e84e499

Fix typo in core:exportdesc

e3e1a98

Improve explanation of type imports and fresh type index spaces

5934e70

s/Bottom type/Empty type/

43a6156

Tweak wording around type imports/exports rationale

88816fc

Don't use core-prefix in <sort>

1d86076

Fix bug in example

61314cf

Update example to match explicit sort in exportdesc

a0eb043

Revert previous; update inline alias syntax description to match all …

89ed435

…the examples

Tweak validation wording

f3d60a2

Co-authored-by: Peter Huene <[email protected]>

Avoid EH conflicts in binary encoding of core:sort

33f8e37

Co-authored-by: Peter Huene <[email protected]>

Sync externdesc with preceding binary format opcode change

080f4c3

Co-authored-by: Peter Huene <[email protected]>

Sync core:instantiatearg with preceding binary format opcode change

7401e4c

Co-authored-by: Peter Huene <[email protected]>

Make <functype> in 'canon lift' symmetric to imports

49fb117

lukewagner force-pushed the core-split branch from e5b93c4 to 49fb117 Compare May 26, 2022 22:22

lukewagner merged commit bcc2002 into main May 27, 2022

lukewagner deleted the core-split branch May 27, 2022 19:37

lukewagner mentioned this pull request Jun 2, 2022

Index spaces feel inconsistent about what they might or might not contain #11

Closed

peterhuene mentioned this pull request Jun 6, 2022

Update wasm-tools for the latest component model proposal bytecodealliance/wasm-tools#621

Merged

lukewagner mentioned this pull request Jun 8, 2022

(Re-)Consider multi-return vs unit #41

Closed

alexcrichton mentioned this pull request Jun 8, 2022

Should aliases be allowed in module types? #43

Closed

Split core definitions out into separate index spaces #29

Split core definitions out into separate index spaces #29

Uh oh!

Conversation

lukewagner commented May 2, 2022

Uh oh!

lukewagner commented May 3, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukewagner commented May 10, 2022

Uh oh!

Uh oh!

rossberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rossberg May 11, 2022

Choose a reason for hiding this comment

Uh oh!

lukewagner May 11, 2022

Choose a reason for hiding this comment

Uh oh!

rossberg May 23, 2022

Choose a reason for hiding this comment

Uh oh!

lukewagner May 23, 2022

Choose a reason for hiding this comment

Uh oh!

rossberg May 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pl-semiotics left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukewagner commented May 24, 2022

Uh oh!

pl-semiotics May 24, 2022

Choose a reason for hiding this comment

Uh oh!

lukewagner May 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukewagner commented May 27, 2022

Uh oh!

Uh oh!