Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Re-)Consider multi-return vs unit #41

Closed
lukewagner opened this issue Jun 8, 2022 · 6 comments
Closed

(Re-)Consider multi-return vs unit #41

lukewagner opened this issue Jun 8, 2022 · 6 comments

Comments

@lukewagner
Copy link
Member

In #29, we had an unresolved comment thread about whether we should make component-level function results symmetric with parameters. The question was independent (pre-existing) of #29, so that PR merged and this issue represents the continued discussion.

One interesting new technical argument just came up via @peterhuene's work: Component-level start sections calling functions that return "nothing" technically always return unit. If we treated unit uniformly (pushed a new unit value into the value index space), then the "linearity" requirement (each value must be consumed exactly once) would mean we have to consume that unit... which would then produce a new unit (ad infinitum). Thus, we have to special-case unit in the validation rules (so it doesn't push a value index) which is irregular and adds validation complexity. It's not a massive problem, but it does add a point to the multi-return column (which can return an empty list of values).

@lukewagner
Copy link
Member Author

#49 raises an interesting case where having an empty return (returning undefined to a JS caller) would produce a more-regular JS interface than a unit-return (returning null to a JS caller).

(I haven't forgotten about this issue; I just haven't had time to think and discuss more about it yet.)

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this issue Jul 11, 2022
Recently I found myself trying to debug an invalid component but the way
that it was invalid meant that I couldn't print it with `wasmprinter`.
This commit removes all the type management in `wasmprinter` which is
currently needed for the `start` function of components (and other
since-removed items I believe historically) to allow printing more
components, namely those that even may be invalid. In general I
personally think it's best for `wasmprinter` to print as much as
possible, even if it's invalid, and leave validation primarily to the
parsing phase later.

This commit does effectively "break" the implementation of printing
modules with `start` functions, however. The `(result ..)` production is
never printed any more since the type information here is not tracked.
This is tracked at WebAssembly/component-model#41 for updates to the
`start` function.
@lukewagner
Copy link
Member Author

Here's a third argument in favor of multi-return that I recently realized:

To support runtime instantiation of components (going beyond the declarative all-up-front instantiation we currently have), I've been imagining that we'd define a new canonical built-in (canon instantiate $Component (func $new)) that defines a function $new that, when called, creates a new instance of $Component. If $Component defines any function exports, then the return type of $new would be a (handle $Component) (with $Component serving as a resource type). But if $Component exclusively exports values, then $new can simply be a normal function returning values (thereby realizing the WASI "command" pattern1). But now the asymmetry of parameters/results bites again: value import names can turn into parameter names but value export names must either be dropped or wrapped into a record.

@lukewagner
Copy link
Member Author

So if we were to switch to multi-return, then an empty result type list would be the way to express what is today (result unit) and the existing function subtyping rules that allow superfluous results to be ignored by the caller would take the place of the special "everything is a subtype of unit" rule. Rather than have these two different ways to express "nothing" (requiring API authors to make an ad hoc choice), I think we should remove unit as a specialization and instead allow <valtype>* as the payload of the variant, union and expected cases, which are, iiuc, the only other places where unit is useful to express "no value" today.

So if expected now takes lists of types, with both lists being empty as a common case, "expected" as an adjective doesn't look right (there's nothing to "expect"). Mulling over alternative names for a while, "result" just seems like the much better name for a return value because, in all cases, you're getting "a result" (which may be simple or compound). (This is just from first principles and inability to find a better name; I'm not even a Rust programmer.) Originally, result was thrown out because result is already used in the s-expression text format for function types and it'd look funny to write (func (result (result))). Fast-forward a year and now our expectation is that component interfaces aren't authored in s-expressions but in wit which lets you write func() -> result. Sure, the resulting <functype> in the resulting component would have (result (result)), but I don't think it's grammatically ambiguous and so I think the tradeoff for, imo, the ideal type name is worth it.

Lastly, if the result list is to be exactly symmetric with the param list, that implies that a result list can contain an arbitrary mix of named and unnamed values. For parameters, this seemed fine because parameter names can always be ignored, falling back to positional parameter rules. But with results, we'll often need to synthesize a source-language value (e.g., to return to a JS caller) which means defining how to interpret these mixed-name types (as opposed to all-named (record) or all-positional (tuple)). It's certainly possible to imagine a binding, of course, but it feels like an unnecessary burden for a corner case that noone needs. Two restrictions that make sense to me are:

  1. a param/result list is either all-named or all-unnamed
  2. a param list is either all-named or contains a single unnamed type

Option 1 seems simpler and more regular, but if we consider the components-as-functions use case in the preceding comment, components can't express multiple unnamed params/results because of the uniqueness constraints on import/export names. More-generally, parameter/result name uniqueness seems like it could be broadly useful and simplifying for the same reason as import/export name uniqueness. Thus, I actually lean toward option 2, which gives each param/result a unique name (with that name possibly being the empty string) while preserving the pragmatic performance goal that scalar return values are not needlessly and annoyingly object-wrapped in JS and all the other dynamic languages.

So that's what I'm thinking a coherent switch to multi-return entails. What do folks think? (Sorry for the churn and flip-flopping here!)

alexcrichton added a commit to bytecodealliance/wasm-tools that referenced this issue Jul 12, 2022
Recently I found myself trying to debug an invalid component but the way
that it was invalid meant that I couldn't print it with `wasmprinter`.
This commit removes all the type management in `wasmprinter` which is
currently needed for the `start` function of components (and other
since-removed items I believe historically) to allow printing more
components, namely those that even may be invalid. In general I
personally think it's best for `wasmprinter` to print as much as
possible, even if it's invalid, and leave validation primarily to the
parsing phase later.

This commit does effectively "break" the implementation of printing
modules with `start` functions, however. The `(result ..)` production is
never printed any more since the type information here is not tracked.
This is tracked at WebAssembly/component-model#41 for updates to the
`start` function.
@peterhuene
Copy link
Contributor

I think the above makes a lot of sense to me. I don't have any objections with the other proposed changes.

Regarding option 2 from above:

a param list is either all-named or contains a single unnamed type

Is my understanding of this, and the following paragraph you wrote, correct with the following summary?

a param/result list is either empty, a single unnamed param/result, or all uniquely-named.

the empty string is considered a valid name.

If so, I think that option gives the most flexibility too.

@lukewagner
Copy link
Member Author

Yep! Thanks for clarifying. Thus, you can write both (func (result "" u32)) and (func (result u32)) and they are (hopefully not too annoyingly in practice...) distinct types.

@lukewagner
Copy link
Member Author

Writing this up, I realized we probably don't want to have union payloads be list of types: it makes the syntax way more awkward (lists of lists of types) and the point of union was to specialize anyways. But other than that, everything described above is in #69.

code-terror pushed a commit to code-terror/wasm-tools that referenced this issue Aug 24, 2022
Recently I found myself trying to debug an invalid component but the way
that it was invalid meant that I couldn't print it with `wasmprinter`.
This commit removes all the type management in `wasmprinter` which is
currently needed for the `start` function of components (and other
since-removed items I believe historically) to allow printing more
components, namely those that even may be invalid. In general I
personally think it's best for `wasmprinter` to print as much as
possible, even if it's invalid, and leave validation primarily to the
parsing phase later.

This commit does effectively "break" the implementation of printing
modules with `start` functions, however. The `(result ..)` production is
never printed any more since the type information here is not tracked.
This is tracked at WebAssembly/component-model#41 for updates to the
`start` function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants