-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split core definitions out into separate index spaces #29
Conversation
(For anyone who eagerly looked at the PR yesterday, the second commit fixes the binary encoding of |
This most recent update removes the hand-waving in type grammar since @pl-semiotics showed it was ambiguous and simply wrong in a few cases. The type grammar now uses explicit type indices (calling out when they need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I think this is cleaner and clearer than before!
The `func` type constructor describes a component-level function definition | ||
that takes and returns `valtype`. In contrast to [`core:functype`] which, as a | ||
low-level compiler target for a stack machine, returns zero or more results, | ||
`functype` always returns a single type, with `unit` being used for functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit saddened by this aymmetry. If languages can bind to tuple types, shouldn't they likewise be able to bind to functions with multiple results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the asymmetry is unsurprising when you consider that one is meant to be "source-level" and one is "assembly-level". Also, from a source-level perspective, multi-return opens up a number of questions that just aren't an issue with the current design: can the multi-return values be named (for symmetry with params, where we do really want the names), but then are the language bindings special-casing the unnamed-single-value case to just return the value, which is a bit irregular, and are some toolchains adding names just for the documentative value but now the language bindings are forced to wrap the returned scalar at runtime and adding overhead (which isn't an issue for named params).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the asymmetry is unsurprising when you consider that one is meant to be "source-level" and one is "assembly-level".
That's insinuating that all source languages have single return values, which of course isn't the case. One could argue that component types use a smallest common denominator approach, but that's not true either, since they feature things like variants etc.
I understand what you mean regarding bindings and names, but is that really an asymmetry? If you simply allowed the names, languages who have no use for named return values could simply ignore the names and handle them positionally. The same is actually true for parameters: there are languages where parameter names are relevant at call sites (e.g., named parameters can be used to reorder the argument list), but most languages just do calls positionally and parameter names from the types are irrelevant beyond documentation value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could argue that component types use a smallest common denominator approach, but that's not true either, since they feature things like variants etc.
I know, it's a fuzzy least common denominator, but in the particular case of return values, single-return (with good tuple integration) is overwhelmingly more common.
f you simply allowed the names, languages who have no use for named return values could simply ignore the names and handle them positionally. The same is actually true for parameters:
I think the asymmetry arises due to the producing nature of return types vs. the consuming nature of param types. The language binding for params can easily consume either named or positional parameters, but the language binding for results must pick one, which has consequences either way. In particular, all languages will have some way to represent named return values (e.g., as a dictionary) which will have some fixed overhead vs. the unnamed scalar binding, so the language bindings will have to just pick something, biasing one way or the other. With single-return, there would be a clear directive for language bindings to produce a name-bearing value for record
results (and of course unnamed for scalar or tuple
results), allowing interface authors to more-reliably say what they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar choice is also necessary for parameters sometimes: there are a number of languages (especially functional ones) that only have single parameter functions. Multiple parameters are then represented either with tuples or by currying. These are two mutually exclusive choices, and a mapping for such a language will have to pick one over the other.
Likewise, there are languages where named vs positional parameter definitions is a mutually exclusive choice.
So I wouldn't count either of this as a qualitative difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes a lot more sense! I'll make the changes to the interpreter parser and make sure that the examples all still parse, but this version of the type grammar looks much cleaner to me :) (and the fact that it lines up much more nicely with the binary format/AST is very nice!)
Excellent feedback and suggestions so far; thanks all! Things seem to have quieted down, so I'll hold until the end of the week and merge if it's still quiet. (We can keep talking about the pre-existing (relative to this PR) single-vs-multi-return issue separately.) |
design/mvp/Explainer.md
Outdated
Canonical definitions specify one of these two wrapping directions, the function | ||
to wrap and a list of configuration options: | ||
``` | ||
canon ::= (canon lift core-prefix(<core:funcidx>) <functype> <canonopt>* (func <id>?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this <functype>
be also allowed to be a <typeidx>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, good catch! So, in updating <functype>
to follow the same explicit "inline type or (type )" scheme as elsewhere, I realized that we could make canon lift
a lot more symmetric with imports by having the function type go with the sort and id. So this change moves around the functype (and allows type indices). If you look at the corresponding changes in the examples, it seems clear this is what it should've been all along.
Co-authored-by: Peter Huene <[email protected]>
Co-authored-by: Peter Huene <[email protected]>
Co-authored-by: Peter Huene <[email protected]>
Co-authored-by: Peter Huene <[email protected]>
Ok, merging this, and we can continue to iterate by filing new issues. Next up, |
This PR addresses #11 as proposed in that issue. The split ended up touching a bunch of the AST/text/binary and even the basic terminology used so the PR ended up being fairly extensive and requiring a few tries to get something that feels right. But ultimately, all the changes here should be superficial; the essential concepts should be the same.
As part of this change, a few simplifications and better symmetries (between core and component layers) presented themselves which are included in the PR:
sort
andcore:sort
which enumerate them in one place so that they can be reused. A "sort" is what you get when you consider that(func)
and(import (func))
are literally different kinds of definitions but both go into the same index space. So "sort" is 1:1 with index space, and we can say that a definition has a "sort".intertype
increasingly seem like the wrong names for the type of "first class values used by components" when juxtaposed withcore:valtype
. Moreover, the "interface" of a component isn't just defined by value types, but also the other types too (func
,module
,component
,instance
). Thus, this PR replacesintertype
withvaltype
, drops the word "interface" when referring to values, value types, functions, function types and instead qualifies these words with "core" and "component" when needed.deftype
was changed to align with the GC proposal which had a slightly different meaning and context (previously, "deftype" meant "definition type" (i.e., the type of second-class definitions)), but now it means "defined typed" (i.e., things you can stick in a(type ...)
definition, which is subtly different).(eq ...)
.-def
(that appeared inside module, instance and component types) were renamed to end indecl
("declarator") because, in my experience, the hyphen is hard to use in communication. "Declarator" isn't exactly a perfect name, but at least it's a single word that we can give a clear meaning in this spec).kind
field of the binary formatpreamble
was renamed tolayer
because "kind" seems overloaded.There were also a few bug fixes that I found when rereading everything. In particular, I see before I was a bit sloppy with abusing the syntactic sugar, so a few examples got an extra
(func ...)
or(type ...)
wrapped around the identifier.