Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split core definitions out into separate index spaces #29

Merged
merged 27 commits into from
May 27, 2022
Merged

Conversation

lukewagner
Copy link
Member

This PR addresses #11 as proposed in that issue. The split ended up touching a bunch of the AST/text/binary and even the basic terminology used so the PR ended up being fairly extensive and requiring a few tries to get something that feels right. But ultimately, all the changes here should be superficial; the essential concepts should be the same.

As part of this change, a few simplifications and better symmetries (between core and component layers) presented themselves which are included in the PR:

  • Instead of enumerating all the different definitions in various places, the PR defines sort and core:sort which enumerate them in one place so that they can be reused. A "sort" is what you get when you consider that (func) and (import (func)) are literally different kinds of definitions but both go into the same index space. So "sort" is 1:1 with index space, and we can say that a definition has a "sort".
  • "Interface types" and intertype increasingly seem like the wrong names for the type of "first class values used by components" when juxtaposed with core:valtype. Moreover, the "interface" of a component isn't just defined by value types, but also the other types too (func, module, component, instance). Thus, this PR replaces intertype with valtype, drops the word "interface" when referring to values, value types, functions, function types and instead qualifies these words with "core" and "component" when needed.
  • deftype was changed to align with the GC proposal which had a slightly different meaning and context (previously, "deftype" meant "definition type" (i.e., the type of second-class definitions)), but now it means "defined typed" (i.e., things you can stick in a (type ...) definition, which is subtly different).
  • Type imports/exports better align with the existing core type imports/exports proposal by wrapping the exact-type bounds in (eq ...).
  • The production names ending in -def (that appeared inside module, instance and component types) were renamed to end in decl ("declarator") because, in my experience, the hyphen is hard to use in communication. "Declarator" isn't exactly a perfect name, but at least it's a single word that we can give a clear meaning in this spec).
  • The kind field of the binary format preamble was renamed to layer because "kind" seems overloaded.

There were also a few bug fixes that I found when rereading everything. In particular, I see before I was a bit sloppy with abusing the syntactic sugar, so a few examples got an extra (func ...) or (type ...) wrapped around the identifier.

@lukewagner
Copy link
Member Author

(For anyone who eagerly looked at the PR yesterday, the second commit fixes the binary encoding of valtype to match how it worked with intertype, and also refactors the type grammar a bit.)

@lukewagner
Copy link
Member Author

This most recent update removes the hand-waving in type grammar since @pl-semiotics showed it was ambiguous and simply wrong in a few cases. The type grammar now uses explicit type indices (calling out when they need (type <typeidx>)/(type $id) vs when they're just <typeidx>/$id) and inline type expressions that should more-faithfully capture the actual proposed text format here. It also lines up more clearly with the binary format, which is nice. @pl-semiotics LMKWYT

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I think this is cleaner and clearer than before!

The `func` type constructor describes a component-level function definition
that takes and returns `valtype`. In contrast to [`core:functype`] which, as a
low-level compiler target for a stack machine, returns zero or more results,
`functype` always returns a single type, with `unit` being used for functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit saddened by this aymmetry. If languages can bind to tuple types, shouldn't they likewise be able to bind to functions with multiple results?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the asymmetry is unsurprising when you consider that one is meant to be "source-level" and one is "assembly-level". Also, from a source-level perspective, multi-return opens up a number of questions that just aren't an issue with the current design: can the multi-return values be named (for symmetry with params, where we do really want the names), but then are the language bindings special-casing the unnamed-single-value case to just return the value, which is a bit irregular, and are some toolchains adding names just for the documentative value but now the language bindings are forced to wrap the returned scalar at runtime and adding overhead (which isn't an issue for named params).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the asymmetry is unsurprising when you consider that one is meant to be "source-level" and one is "assembly-level".

That's insinuating that all source languages have single return values, which of course isn't the case. One could argue that component types use a smallest common denominator approach, but that's not true either, since they feature things like variants etc.

I understand what you mean regarding bindings and names, but is that really an asymmetry? If you simply allowed the names, languages who have no use for named return values could simply ignore the names and handle them positionally. The same is actually true for parameters: there are languages where parameter names are relevant at call sites (e.g., named parameters can be used to reorder the argument list), but most languages just do calls positionally and parameter names from the types are irrelevant beyond documentation value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue that component types use a smallest common denominator approach, but that's not true either, since they feature things like variants etc.

I know, it's a fuzzy least common denominator, but in the particular case of return values, single-return (with good tuple integration) is overwhelmingly more common.

f you simply allowed the names, languages who have no use for named return values could simply ignore the names and handle them positionally. The same is actually true for parameters:

I think the asymmetry arises due to the producing nature of return types vs. the consuming nature of param types. The language binding for params can easily consume either named or positional parameters, but the language binding for results must pick one, which has consequences either way. In particular, all languages will have some way to represent named return values (e.g., as a dictionary) which will have some fixed overhead vs. the unnamed scalar binding, so the language bindings will have to just pick something, biasing one way or the other. With single-return, there would be a clear directive for language bindings to produce a name-bearing value for record results (and of course unnamed for scalar or tuple results), allowing interface authors to more-reliably say what they want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar choice is also necessary for parameters sometimes: there are a number of languages (especially functional ones) that only have single parameter functions. Multiple parameters are then represented either with tuples or by currying. These are two mutually exclusive choices, and a mapping for such a language will have to pick one over the other.

Likewise, there are languages where named vs positional parameter definitions is a mutually exclusive choice.

So I wouldn't count either of this as a qualitative difference.

Copy link
Collaborator

@pl-semiotics pl-semiotics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes a lot more sense! I'll make the changes to the interpreter parser and make sure that the examples all still parse, but this version of the type grammar looks much cleaner to me :) (and the fact that it lines up much more nicely with the binary format/AST is very nice!)

@lukewagner
Copy link
Member Author

Excellent feedback and suggestions so far; thanks all! Things seem to have quieted down, so I'll hold until the end of the week and merge if it's still quiet. (We can keep talking about the pre-existing (relative to this PR) single-vs-multi-return issue separately.)

Canonical definitions specify one of these two wrapping directions, the function
to wrap and a list of configuration options:
```
canon ::= (canon lift core-prefix(<core:funcidx>) <functype> <canonopt>* (func <id>?))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this <functype> be also allowed to be a <typeidx>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, good catch! So, in updating <functype> to follow the same explicit "inline type or (type )" scheme as elsewhere, I realized that we could make canon lift a lot more symmetric with imports by having the function type go with the sort and id. So this change moves around the functype (and allows type indices). If you look at the corresponding changes in the examples, it seems clear this is what it should've been all along.

@lukewagner
Copy link
Member Author

Ok, merging this, and we can continue to iterate by filing new issues. Next up, future and stream :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants