Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update wasm-tools for the latest component model proposal #621

Merged
merged 33 commits into from
Jun 7, 2022

Conversation

peterhuene
Copy link
Member

@peterhuene peterhuene commented Jun 6, 2022

This PR updates all of tools in wasm-tools that currently support the component model for the latest proposal changes.

The component model proposal had a hefty change to split out core and component index spaces and, in the process, had a number of significant changes to both the text and binary formats.

An overview of the changes in this PR:

  • wasm-encoder - refactored encoding of core vs. component sections that enable writing some core sections into a component, aligning the API with the latest binary format.
  • wasmparser - refactored reading of core vs. component sections and updated validation to account for the additional index spaces.
  • wasm-mutate - updated for the wasm-encoder and wasm-parser changes.
  • wasm-smith - update what's currently supported for arbitrary component encoding to the latest wasm-encoder changes; added support for arbitrary core type sections in components.
  • wasmparser-dump - updated for wasmparser API changes.
  • wasmprinter - updated for wasmparser API changes, support for additional index spaces, and also to output the latest component model text format.
  • wast - overhauled the existing component model parsing support to conform to the latest component model text format and replaced component encoding to use wasm-encoder (we can fully switch over to wasm-encoder for core encoding once it supports encoding the GC types).

This PR also updates all of the component-model-related text format test files where appropriate and fixed a significant number of FIXMEs along the way.

I realize this is a huge PR, so almost all of the commits target an individual tool at a time, except for Add core types to component and instance types, which updates the previous implementations to fix a correction to the component model spec that we discovered later and some additional (bite sized) follow-up fix commits that fixed bugs that were discovered once round-trip testing was possible again.

peterhuene added 20 commits May 26, 2022 15:10
This commit updates and refactors `wasm-encoder` to include the latest
component model changes.

Most notably, the index spaces for core and component items in the component
model have been separated.

As a result, some of the component section encoders have been split in two,
with a "core" version going into the `core` module; for things like aliases and
instance sections, the core versions may one day be used in module encoding if
a standalone module linking proposal is put forth.
This commit updates `wasm-mutate` so that it builds with the `wasm-encoder` API
changes.
This commit updates `wasm-smith` for the `wasm-encoder` API changes.

It also updates `wasm-smith` for the component model spec changes, namely
encoding core and component types in discrete sections.
This commit refactors `wasmparser` to conform to the latest component model
spec.

The biggest change is that core items and component items have distinct index
spaces and sections. This had a large impact on the validator.

Various type names have been changed to better align with the terminology used
in the component model spec ("interface" is now gone in favor of component
"value types").
This commit updates `wasmparser-dump` to the latest `wasmparser` changes.
This commit updates `wasm-mutate` to the latest changes in `wasmparser`.
This commit updates `wasm-smith` for the latest `wasmparser` changes.
This commit updates `wasmprinter` for both the latest `wasmparser` changes and
to implement the separate index spaces for core and component items.
This commit updates the incremental-parse fuzz target to conform to the latest
`wasmparser` changes.
This commit updates the `wasm-tools objdump` command for the latest
`wasmparser` changes.
This commit updates `wasm-encoder`, `wasmparser`, `wasmprinter`, and
`wasm-smith` to support core types in component and instance type declarations.

This was missing from the recent spec updates and has now been fixed in the
spec.
This fixes a typo that resulted from refactoring the alias section into a core
and component alias section.
This commit fixes the missing increments of `num_added` for both core and
component aliases.
This commit fixes component validation to ensure the types referenced from
the `Realloc` and `PostReturn` options are core function types.
This commit fixes wasmprinter in places where it wasn't conforming properly to
the text format.

Additionally, it fixes a bug where instance/component types weren't registering
the exports of the type and subsequent lookups of the exports would fail.
This commit fixes the validation of the post-return option so that it
be present for lifting operations and that the parameters to the refer
function match the results of the function being lifted.
This commit fixes the printer output for start functions to match what's
expected in the text format.
This commit basically rewrites the component model support in wast for the
latest text format and binary format changes.

`wast` now depends on `wasm-encoder` for encoding components so that there is a
single implementation of the component binary format.

In the future, we can also replace the core encoding implementation with using
`wasm-encoder` instead (once it supports the GC proposal encoding).
This commit updates the component model wat tests to the current text format.

It also removes FIXMEs where appropriate and attempts to convert the remaining
"fixme" holdover test files from the module linking proposal.
Update doc comments relating to the `post-return` option to match the spec.
@peterhuene peterhuene force-pushed the update-component-model branch from e92754f to e6025b8 Compare June 6, 2022 01:43
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for implementing all of this and updating everything @peterhuene! As the PR size is testament too this is no small change or effort necessary to do so.

Given the current state I think it's prudent to get this landed and iterate in-tree further. I would like to resolve the review points below about the modification of the core wasm AST in wasmparser first, however, given the impact on parsing core wasm modules.

Otherwise I think it's fine to use follow-up issues to track futher improvements instead of trying to handle everything here in this PR.

For the future though I think it would be best to try to slim down the size of PRs like this. I personally find it really hard to approach a many-thousand line PR and review all parts equally well. While large parts of this PR can be easily glossed over I personally try to pay particularly close attention to wasmparser changes due to that being the component here which most frequently takes untrusted input and needs to "not do bad things". The unbounded recursion issue I've noted below is something I missed in the original component model implementation and something closer review on my part would have caught.

I don't mean to say though that it would be best to have N 100 line prs instead of one 100N line PR. This is naturally a pretty big change to everything component-related and it's going to be a big PR no matter what. However there are a number of distractions to sift through here such as clippy suggestions, renamings like Type to ValType, using wasm-encoder in wast, etc. Splitting out as much as possible at least for me really helps with review because it's easier to focus on the changes at hand.

Ok(match self.read_u8()? {
0x60 => TypeDef::Func(self.read_func_type()?),
0x60 => Type::Func(self.read_func_type()?),
0x50 => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I do not think that we should be modifying the AST of core wasm modules at this time at all. I see that the component model is adding some bits here and there for future-facing proposals like module-linking-in-core-wasm, but I do not think we should add support for wasmparser. Otherwise we run the risk of exposing known bugs to wasmparser to everyone who's not enabling the component model. Specifically two issues here are:

  • This suffers the "double parse" problem where the section is parsed once for validation and then again for the embedder.
  • The read_module_type_decl can recursively call read_type and there's no bound on the recursion

I also think from an ergonomics and library-design point of view it's not the right time to add support for more types in the core wasm AST. While module linking may be a thing one day I've historically found it quite cumbersome to chase future proposals that don't actually exist yet (e.g. the very long line of refactorings that led wasm-bindgen to what it is today). I personally feel that it's better to stick to having wasmparser only support the component model and core wasm as-is today and not go out of our way to future-proof to something hypothetical.

Also to clarify this is the first location I found where there's a "real issue" related to this support (possible stack overflow in the host when validating an arbitrary module). There are other locations such as enum ExternalKind which grew Module and Instance variants. I don't think we should change ExternalKind at this time from what it is today since core wasm modules can't actually create instances or modules. If this comes at a cost of less-than-perfect symmetric in the component-model AST I personally think that's ok since that's just something the component model has to handle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also another clarification is that this is a preexisting issue for components as well. I don't really know when we would have found it but I would hope that we would beef up our fuzzers eventually to have found this issue before enabling everything by default. So in some sense there's an issue here where we need to track the unbounded recursion problem but at the same time for this PR itself I think it's only necessary to get the API design down which in my opinion should be to not modify the core wasm AST.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now for module parsing (can't parse module types). Not yet fixed for components (existing issue).

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasm-smith changes LGTM

@peterhuene
Copy link
Member Author

In the future, for a very large change such as this one where tests will not be passing until pretty much the very last commit due to how all of the tools are sort of dependent on each other for testing, would you be opposed to conducting reviews on a fork or (preferably) into a staging branch in this repo?

I'd certainly prefer doing it that way so that I can make many small commits for updating a single tool, get them reviewed, move on to the next one, and then once the tests are passing again, put up one big merge-from-staging PR that can easily be signed-off on because of the prior reviews?

This commit updates any remaining place that used the old "defaults to"
terminology for variant cases in the component model.
@alexcrichton
Copy link
Member

I think that Type::Module in wasmparser still exists (e.g. the unbounded recursion exposed to core wasm), but once that's moved over to just components I think this is fine to land once CI is green afterwards.

@peterhuene
Copy link
Member Author

peterhuene commented Jun 7, 2022

As TypeSectionReader is currently shared between modules and components, to properly fix it so that the Type enumeration does not include component-related types (i.e. Module), we may need to fork it to a CoreTypeSectionReader used in component reading only. This would also necessitate a CoreType enum too, which is less than ideal (gestures broadly "it's enums as far as the eye can see").

@alexcrichton what are your thoughts? I too would like to prevent TypeSectionReader from trying to read module types when parsing core wasm, but right now the readers are ignorant of the format they're reading.

@alexcrichton
Copy link
Member

I don't really have a better idea than that myself. I don't disagree that we're drowning in enums though...

This commit updates the parsing and resolution of variant cases for the
`refines` clause.

Variant cases may now have an optional identifier on them, which is referred to
in a `refines` clause, rather than having to use the variant name string.

This makes it clearer to users of the text format that a resolution is taking
place.
@peterhuene peterhuene force-pushed the update-component-model branch from 815aa32 to a3e18e7 Compare June 7, 2022 18:44
This commit moves the component core type reading from `TypeSectionReader` into
a new reader called `CoreTypeSectionReader`.

By doing so, we prevent the parsing of modules from supporting core types
currently only supported in components.
@fitzgen
Copy link
Member

fitzgen commented Jun 7, 2022

@alexcrichton said

Given the current state I think it's prudent to get this landed and iterate in-tree further.

FWIW, I agree that this is big enough that we should just merge this and then evolve in-tree since all other component-model stuff needs to be rebased on top of this.

Did I miss something about why we aren't doing that (yet?) here?

Similar to the previous commit, this moves component core type encoding to a
`CoreTypeSection` rather than extending `TypeSection` to encode for both
modules and components.

This helps make the distinction between a "type section" (module) and a "core
type section" (component) a little more clear until such type that modules and
components can share a core type section format.
@alexcrichton
Copy link
Member

This is my concern for landing exactly as-is-this-red-hot-second. That would open up wasmparser to stack overflow when parsing an input module.

@peterhuene
Copy link
Member Author

I have three more commits I'm about to push to resolve that (make CoreTypeSection and CoreTypeSectionReader explicitly different from TypeSection and TypeSectionReader), plus a fix for parsing by index for variant cases.

This commit updates the parsing of variant case refines clauses to allow them
to explicitly reference another case by index rather than identifier alone.
@peterhuene
Copy link
Member Author

Ok, I've pushed up what I hope is the last of the commits to review before we can land this.

@fitzgen I tried to keep the churn in wasm-smith to a minimum given you've already rebased your work on top of it, but unfortunately a few more commits did touch it doing review.

@fitzgen
Copy link
Member

fitzgen commented Jun 7, 2022

@fitzgen I tried to keep the churn in wasm-smith to a minimum given you've already rebased your work on top of it, but unfortunately a few more commits did touch it doing review.

Nah I still haven't finished rebasing, do what you need to ;)

@alexcrichton alexcrichton merged commit 65b123f into bytecodealliance:main Jun 7, 2022
@peterhuene peterhuene deleted the update-component-model branch June 7, 2022 20:14
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jun 8, 2022
The syntax for instantiation changed slightly in bytecodealliance#621 where a bare index
is now required, but this syntax prevents the inline injection and usage
of an alias of an export of another component/instance. This commit
soups up the support here by bringing back the `IndexOrRef` structure
which can be used to parse one of the two, allowing both a bare
reference plus a parenthesized reference that enables alias injection.
alexcrichton added a commit that referenced this pull request Jun 8, 2022
The syntax for instantiation changed slightly in #621 where a bare index
is now required, but this syntax prevents the inline injection and usage
of an alias of an export of another component/instance. This commit
soups up the support here by bringing back the `IndexOrRef` structure
which can be used to parse one of the two, allowing both a bare
reference plus a parenthesized reference that enables alias injection.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jun 8, 2022
This commit updates the wasm-tools family of crates, notably pulling in
the refactorings and updates from bytecodealliance/wasm-tools#621 for
the latest iteration of the component model. This commit additionally
updates all support for the component model for these changes, notably:

* Many bits and pieces of type information was refactored. Many
  `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is
  now `TypeIndex` as well as `ComponentTypeIndex` for the two type index
  spaces in a component.

* A number of new sections are now processed to handle the core and
  component variants.

* Internal maps were split such as the `funcs` map into
  `component_funcs` and `funcs` (same for `instances`).

* Canonical options are now processed individually instead of one bulk
  `into` definition.

Overall this was not a major update to the internals of handling the
component model in Wasmtime. Instead this was mostly a surface-level
refactoring to make sure that everything lines up with the new binary
format for components.

* All text syntax used in tests was updated to the new syntax.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jun 9, 2022
This commit updates the wasm-tools family of crates, notably pulling in
the refactorings and updates from bytecodealliance/wasm-tools#621 for
the latest iteration of the component model. This commit additionally
updates all support for the component model for these changes, notably:

* Many bits and pieces of type information was refactored. Many
  `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is
  now `TypeIndex` as well as `ComponentTypeIndex` for the two type index
  spaces in a component.

* A number of new sections are now processed to handle the core and
  component variants.

* Internal maps were split such as the `funcs` map into
  `component_funcs` and `funcs` (same for `instances`).

* Canonical options are now processed individually instead of one bulk
  `into` definition.

Overall this was not a major update to the internals of handling the
component model in Wasmtime. Instead this was mostly a surface-level
refactoring to make sure that everything lines up with the new binary
format for components.

* All text syntax used in tests was updated to the new syntax.
alexcrichton added a commit to bytecodealliance/wasmtime that referenced this pull request Jun 9, 2022
This commit updates the wasm-tools family of crates, notably pulling in
the refactorings and updates from bytecodealliance/wasm-tools#621 for
the latest iteration of the component model. This commit additionally
updates all support for the component model for these changes, notably:

* Many bits and pieces of type information was refactored. Many
  `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is
  now `TypeIndex` as well as `ComponentTypeIndex` for the two type index
  spaces in a component.

* A number of new sections are now processed to handle the core and
  component variants.

* Internal maps were split such as the `funcs` map into
  `component_funcs` and `funcs` (same for `instances`).

* Canonical options are now processed individually instead of one bulk
  `into` definition.

Overall this was not a major update to the internals of handling the
component model in Wasmtime. Instead this was mostly a surface-level
refactoring to make sure that everything lines up with the new binary
format for components.

* All text syntax used in tests was updated to the new syntax.
code-terror pushed a commit to code-terror/wasm-tools that referenced this pull request Aug 24, 2022
…liance#621)

* wasm-encoder: update to the latest component model changes.

This commit updates and refactors `wasm-encoder` to include the latest
component model changes.

Most notably, the index spaces for core and component items in the component
model have been separated.

As a result, some of the component section encoders have been split in two,
with a "core" version going into the `core` module; for things like aliases and
instance sections, the core versions may one day be used in module encoding if
a standalone module linking proposal is put forth.

* wasm-mutate: update for wasm-encoder changes.

This commit updates `wasm-mutate` so that it builds with the `wasm-encoder` API
changes.

* wasm-smith: update for wasm-encoder changes.

This commit updates `wasm-smith` for the `wasm-encoder` API changes.

It also updates `wasm-smith` for the component model spec changes, namely
encoding core and component types in discrete sections.

* wasmparser: update for the component model spec changes.

This commit refactors `wasmparser` to conform to the latest component model
spec.

The biggest change is that core items and component items have distinct index
spaces and sections. This had a large impact on the validator.

Various type names have been changed to better align with the terminology used
in the component model spec ("interface" is now gone in favor of component
"value types").

* wasmparser-dump: update for component model changes.

This commit updates `wasmparser-dump` to the latest `wasmparser` changes.

* wasm-mutate: update to latest wasmparser changes.

This commit updates `wasm-mutate` to the latest changes in `wasmparser`.

* wasm-smith: update for wasmparser changes.

This commit updates `wasm-smith` for the latest `wasmparser` changes.

* wasmprinter: update for latest component model spec changes.

This commit updates `wasmprinter` for both the latest `wasmparser` changes and
to implement the separate index spaces for core and component items.

* fuzz-targets: update fuzz targets for wasmparser changes.

This commit updates the incremental-parse fuzz target to conform to the latest
`wasmparser` changes.

* objdump: fix the objdump command for latest `wasmparser` changes.

This commit updates the `wasm-tools objdump` command for the latest
`wasmparser` changes.

* Add core types to component and instance types.

This commit updates `wasm-encoder`, `wasmparser`, `wasmprinter`, and
`wasm-smith` to support core types in component and instance type declarations.

This was missing from the recent spec updates and has now been fixed in the
spec.

* wasm-encoder: fix incorrect section id for component alias section.

This fixes a typo that resulted from refactoring the alias section into a core
and component alias section.

* wasm-encoder: fix missing increments when adding alias items.

This commit fixes the missing increments of `num_added` for both core and
component aliases.

* wasmparser: fix incorrect canonical option checking.

This commit fixes component validation to ensure the types referenced from
the `Realloc` and `PostReturn` options are core function types.

* wasmprinter: fixes to output to conform to text format.

This commit fixes wasmprinter in places where it wasn't conforming properly to
the text format.

Additionally, it fixes a bug where instance/component types weren't registering
the exports of the type and subsequent lookups of the exports would fail.

* wasmparser: fix validation for post-return option.

This commit fixes the validation of the post-return option so that it
be present for lifting operations and that the parameters to the refer
function match the results of the function being lifted.

* wasmprinter: fix output for component start functions.

This commit fixes the printer output for start functions to match what's
expected in the text format.

* wasmparser-dump: fix output to make core vs. component items clearer.

* wast: update for the component model proposal changes.

This commit basically rewrites the component model support in wast for the
latest text format and binary format changes.

`wast` now depends on `wasm-encoder` for encoding components so that there is a
single implementation of the component binary format.

In the future, we can also replace the core encoding implementation with using
`wasm-encoder` instead (once it supports the GC proposal encoding).

* Update wat tests.

This commit updates the component model wat tests to the current text format.

It also removes FIXMEs where appropriate and attempts to convert the remaining
"fixme" holdover test files from the module linking proposal.

* Fix doc comment relating to `post-return` option.

Update doc comments relating to the `post-return` option to match the spec.

* wasm-encoder: move component-related core sections.

This commit moves the component-related "core" sections out of the core module
and into the component module in `wasm-encoder`.

Originally these were put in the core module in anticipation of one day having
the sections be part of core, but that's a long way off and may not occur.
Until that time, it makes more sense to keep these sections grouped with the
other component sections.

This also replaces `ComponentAliasKind` with `ComponentOuterAliasKind`, which
should only allow types, modules, and components to be aliased according to the
current spec.

Updated `wast` based on these API changes (the other crates will have similar
changes soon).

* wasmparser: move component-related core sections.

This commit moves the component-related core section parsing out of the `core`
module and into the `component` module in `wasmparser`.

Originally these were put in the core module in anticipation of one day having
the sections be part of core, but that's a long way off and may not occur.
Until that time, it makes sense to keep these sections grouped with the other
component sections.

This commit also removes the extensions to the `ExternalKind` enumeration in
core.

It updates alias parsing to limit outer alias kind to only the supported kinds
(core module, types, and components). This helps reduce confusion between
what's implemented for export aliasing and what's supported for outer aliasing.

By doing so, this also enables the ability to reference outer core types in
nested components.

* wast: fix parsing and encoding of aliases.

This commit fixes the parsing and encoding of aliases such that the grammar is
restricted to what's actually supported for both instance export aliases and
for outer aliases.

It also addresses the inability to reference and encode outer core type
aliases.

`CoreItemRef` was fixed to only have a single export name supported as core
instances cannot export other core instances, so there can be at most one
string parsed for the export name.

* wasm-smith: encode arbitrary outer aliases for core types.

This commit encodes arbitrary outer aliases to core types when creating
arbitrary component and instance types in the component type section.

It also fixes a bug where `scalar_component_funcs` were storing indexes into
`funcs` and not `component_funcs`, causing lowered functions to encode with
potentially out-of-bounds component function indexes.

* wasm-encoder: remove unnecessary `Export` type.

This commit removes the core `Export` type, which is simply the same
enumeration as `ExportKind`, but paired with the exported item's index.

Instead, encoding takes both the kind and index rather than an `Export`.

With this change, encoding core exports functions like component exports in
`wasm-encoder`.

Also fixed where `Export` was used in the other tools.

* wast: add missing version for wasm-encoder reference.

* wasmparser: remove `Module` from `TypeRef`.

This commit removes the `Module` variant from `TypeRef` as it is no longer
needed to support the component model proposal.

* Replace any references to "defaults_to" with "refines".

This commit updates any remaining place that used the old "defaults to"
terminology for variant cases in the component model.

* wast: implement identifier resolution for variant cases.

This commit updates the parsing and resolution of variant cases for the
`refines` clause.

Variant cases may now have an optional identifier on them, which is referred to
in a `refines` clause, rather than having to use the variant name string.

This makes it clearer to users of the text format that a resolution is taking
place.

* wasmparser: move component core type reading to a separate reader.

This commit moves the component core type reading from `TypeSectionReader` into
a new reader called `CoreTypeSectionReader`.

By doing so, we prevent the parsing of modules from supporting core types
currently only supported in components.

* wasm-encoder: move component core type encoding to a separate section.

Similar to the previous commit, this moves component core type encoding to a
`CoreTypeSection` rather than extending `TypeSection` to encode for both
modules and components.

This helps make the distinction between a "type section" (module) and a "core
type section" (component) a little more clear until such type that modules and
components can share a core type section format.

* wast: allow variant case refines clause to reference by index.

This commit updates the parsing of variant case refines clauses to allow them
to explicitly reference another case by index rather than identifier alone.
code-terror pushed a commit to code-terror/wasm-tools that referenced this pull request Aug 24, 2022
…liance#627)

The syntax for instantiation changed slightly in bytecodealliance#621 where a bare index
is now required, but this syntax prevents the inline injection and usage
of an alias of an export of another component/instance. This commit
soups up the support here by bringing back the `IndexOrRef` structure
which can be used to parse one of the two, allowing both a bare
reference plus a parenthesized reference that enables alias injection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants