-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update wasm-tools for the latest component model proposal #621
Update wasm-tools for the latest component model proposal #621
Conversation
This commit updates and refactors `wasm-encoder` to include the latest component model changes. Most notably, the index spaces for core and component items in the component model have been separated. As a result, some of the component section encoders have been split in two, with a "core" version going into the `core` module; for things like aliases and instance sections, the core versions may one day be used in module encoding if a standalone module linking proposal is put forth.
This commit updates `wasm-mutate` so that it builds with the `wasm-encoder` API changes.
This commit updates `wasm-smith` for the `wasm-encoder` API changes. It also updates `wasm-smith` for the component model spec changes, namely encoding core and component types in discrete sections.
This commit refactors `wasmparser` to conform to the latest component model spec. The biggest change is that core items and component items have distinct index spaces and sections. This had a large impact on the validator. Various type names have been changed to better align with the terminology used in the component model spec ("interface" is now gone in favor of component "value types").
This commit updates `wasmparser-dump` to the latest `wasmparser` changes.
This commit updates `wasm-mutate` to the latest changes in `wasmparser`.
This commit updates `wasm-smith` for the latest `wasmparser` changes.
This commit updates `wasmprinter` for both the latest `wasmparser` changes and to implement the separate index spaces for core and component items.
This commit updates the incremental-parse fuzz target to conform to the latest `wasmparser` changes.
This commit updates the `wasm-tools objdump` command for the latest `wasmparser` changes.
This commit updates `wasm-encoder`, `wasmparser`, `wasmprinter`, and `wasm-smith` to support core types in component and instance type declarations. This was missing from the recent spec updates and has now been fixed in the spec.
This fixes a typo that resulted from refactoring the alias section into a core and component alias section.
This commit fixes the missing increments of `num_added` for both core and component aliases.
This commit fixes component validation to ensure the types referenced from the `Realloc` and `PostReturn` options are core function types.
This commit fixes wasmprinter in places where it wasn't conforming properly to the text format. Additionally, it fixes a bug where instance/component types weren't registering the exports of the type and subsequent lookups of the exports would fail.
This commit fixes the validation of the post-return option so that it be present for lifting operations and that the parameters to the refer function match the results of the function being lifted.
This commit fixes the printer output for start functions to match what's expected in the text format.
This commit basically rewrites the component model support in wast for the latest text format and binary format changes. `wast` now depends on `wasm-encoder` for encoding components so that there is a single implementation of the component binary format. In the future, we can also replace the core encoding implementation with using `wasm-encoder` instead (once it supports the GC proposal encoding).
This commit updates the component model wat tests to the current text format. It also removes FIXMEs where appropriate and attempts to convert the remaining "fixme" holdover test files from the module linking proposal.
Update doc comments relating to the `post-return` option to match the spec.
e92754f
to
e6025b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for implementing all of this and updating everything @peterhuene! As the PR size is testament too this is no small change or effort necessary to do so.
Given the current state I think it's prudent to get this landed and iterate in-tree further. I would like to resolve the review points below about the modification of the core wasm AST in wasmparser first, however, given the impact on parsing core wasm modules.
Otherwise I think it's fine to use follow-up issues to track futher improvements instead of trying to handle everything here in this PR.
For the future though I think it would be best to try to slim down the size of PRs like this. I personally find it really hard to approach a many-thousand line PR and review all parts equally well. While large parts of this PR can be easily glossed over I personally try to pay particularly close attention to wasmparser
changes due to that being the component here which most frequently takes untrusted input and needs to "not do bad things". The unbounded recursion issue I've noted below is something I missed in the original component model implementation and something closer review on my part would have caught.
I don't mean to say though that it would be best to have N 100 line prs instead of one 100N line PR. This is naturally a pretty big change to everything component-related and it's going to be a big PR no matter what. However there are a number of distractions to sift through here such as clippy suggestions, renamings like Type
to ValType
, using wasm-encoder
in wast
, etc. Splitting out as much as possible at least for me really helps with review because it's easier to focus on the changes at hand.
Ok(match self.read_u8()? { | ||
0x60 => TypeDef::Func(self.read_func_type()?), | ||
0x60 => Type::Func(self.read_func_type()?), | ||
0x50 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I do not think that we should be modifying the AST of core wasm modules at this time at all. I see that the component model is adding some bits here and there for future-facing proposals like module-linking-in-core-wasm, but I do not think we should add support for wasmparser. Otherwise we run the risk of exposing known bugs to wasmparser to everyone who's not enabling the component model. Specifically two issues here are:
- This suffers the "double parse" problem where the section is parsed once for validation and then again for the embedder.
- The
read_module_type_decl
can recursively callread_type
and there's no bound on the recursion
I also think from an ergonomics and library-design point of view it's not the right time to add support for more types in the core wasm AST. While module linking may be a thing one day I've historically found it quite cumbersome to chase future proposals that don't actually exist yet (e.g. the very long line of refactorings that led wasm-bindgen to what it is today). I personally feel that it's better to stick to having wasmparser only support the component model and core wasm as-is today and not go out of our way to future-proof to something hypothetical.
Also to clarify this is the first location I found where there's a "real issue" related to this support (possible stack overflow in the host when validating an arbitrary module). There are other locations such as enum ExternalKind
which grew Module
and Instance
variants. I don't think we should change ExternalKind
at this time from what it is today since core wasm modules can't actually create instances or modules. If this comes at a cost of less-than-perfect symmetric in the component-model AST I personally think that's ok since that's just something the component model has to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also another clarification is that this is a preexisting issue for components as well. I don't really know when we would have found it but I would hope that we would beef up our fuzzers eventually to have found this issue before enabling everything by default. So in some sense there's an issue here where we need to track the unbounded recursion problem but at the same time for this PR itself I think it's only necessary to get the API design down which in my opinion should be to not modify the core wasm AST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now for module parsing (can't parse module types). Not yet fixed for components (existing issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasm-smith
changes LGTM
In the future, for a very large change such as this one where tests will not be passing until pretty much the very last commit due to how all of the tools are sort of dependent on each other for testing, would you be opposed to conducting reviews on a fork or (preferably) into a staging branch in this repo? I'd certainly prefer doing it that way so that I can make many small commits for updating a single tool, get them reviewed, move on to the next one, and then once the tests are passing again, put up one big merge-from-staging PR that can easily be signed-off on because of the prior reviews? |
This commit updates any remaining place that used the old "defaults to" terminology for variant cases in the component model.
I think that |
As @alexcrichton what are your thoughts? I too would like to prevent |
I don't really have a better idea than that myself. I don't disagree that we're drowning in enums though... |
This commit updates the parsing and resolution of variant cases for the `refines` clause. Variant cases may now have an optional identifier on them, which is referred to in a `refines` clause, rather than having to use the variant name string. This makes it clearer to users of the text format that a resolution is taking place.
815aa32
to
a3e18e7
Compare
This commit moves the component core type reading from `TypeSectionReader` into a new reader called `CoreTypeSectionReader`. By doing so, we prevent the parsing of modules from supporting core types currently only supported in components.
@alexcrichton said
FWIW, I agree that this is big enough that we should just merge this and then evolve in-tree since all other component-model stuff needs to be rebased on top of this. Did I miss something about why we aren't doing that (yet?) here? |
Similar to the previous commit, this moves component core type encoding to a `CoreTypeSection` rather than extending `TypeSection` to encode for both modules and components. This helps make the distinction between a "type section" (module) and a "core type section" (component) a little more clear until such type that modules and components can share a core type section format.
This is my concern for landing exactly as-is-this-red-hot-second. That would open up |
I have three more commits I'm about to push to resolve that (make |
This commit updates the parsing of variant case refines clauses to allow them to explicitly reference another case by index rather than identifier alone.
Ok, I've pushed up what I hope is the last of the commits to review before we can land this. @fitzgen I tried to keep the churn in |
Nah I still haven't finished rebasing, do what you need to ;) |
The syntax for instantiation changed slightly in bytecodealliance#621 where a bare index is now required, but this syntax prevents the inline injection and usage of an alias of an export of another component/instance. This commit soups up the support here by bringing back the `IndexOrRef` structure which can be used to parse one of the two, allowing both a bare reference plus a parenthesized reference that enables alias injection.
The syntax for instantiation changed slightly in #621 where a bare index is now required, but this syntax prevents the inline injection and usage of an alias of an export of another component/instance. This commit soups up the support here by bringing back the `IndexOrRef` structure which can be used to parse one of the two, allowing both a bare reference plus a parenthesized reference that enables alias injection.
This commit updates the wasm-tools family of crates, notably pulling in the refactorings and updates from bytecodealliance/wasm-tools#621 for the latest iteration of the component model. This commit additionally updates all support for the component model for these changes, notably: * Many bits and pieces of type information was refactored. Many `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is now `TypeIndex` as well as `ComponentTypeIndex` for the two type index spaces in a component. * A number of new sections are now processed to handle the core and component variants. * Internal maps were split such as the `funcs` map into `component_funcs` and `funcs` (same for `instances`). * Canonical options are now processed individually instead of one bulk `into` definition. Overall this was not a major update to the internals of handling the component model in Wasmtime. Instead this was mostly a surface-level refactoring to make sure that everything lines up with the new binary format for components. * All text syntax used in tests was updated to the new syntax.
This commit updates the wasm-tools family of crates, notably pulling in the refactorings and updates from bytecodealliance/wasm-tools#621 for the latest iteration of the component model. This commit additionally updates all support for the component model for these changes, notably: * Many bits and pieces of type information was refactored. Many `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is now `TypeIndex` as well as `ComponentTypeIndex` for the two type index spaces in a component. * A number of new sections are now processed to handle the core and component variants. * Internal maps were split such as the `funcs` map into `component_funcs` and `funcs` (same for `instances`). * Canonical options are now processed individually instead of one bulk `into` definition. Overall this was not a major update to the internals of handling the component model in Wasmtime. Instead this was mostly a surface-level refactoring to make sure that everything lines up with the new binary format for components. * All text syntax used in tests was updated to the new syntax.
This commit updates the wasm-tools family of crates, notably pulling in the refactorings and updates from bytecodealliance/wasm-tools#621 for the latest iteration of the component model. This commit additionally updates all support for the component model for these changes, notably: * Many bits and pieces of type information was refactored. Many `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is now `TypeIndex` as well as `ComponentTypeIndex` for the two type index spaces in a component. * A number of new sections are now processed to handle the core and component variants. * Internal maps were split such as the `funcs` map into `component_funcs` and `funcs` (same for `instances`). * Canonical options are now processed individually instead of one bulk `into` definition. Overall this was not a major update to the internals of handling the component model in Wasmtime. Instead this was mostly a surface-level refactoring to make sure that everything lines up with the new binary format for components. * All text syntax used in tests was updated to the new syntax.
…liance#621) * wasm-encoder: update to the latest component model changes. This commit updates and refactors `wasm-encoder` to include the latest component model changes. Most notably, the index spaces for core and component items in the component model have been separated. As a result, some of the component section encoders have been split in two, with a "core" version going into the `core` module; for things like aliases and instance sections, the core versions may one day be used in module encoding if a standalone module linking proposal is put forth. * wasm-mutate: update for wasm-encoder changes. This commit updates `wasm-mutate` so that it builds with the `wasm-encoder` API changes. * wasm-smith: update for wasm-encoder changes. This commit updates `wasm-smith` for the `wasm-encoder` API changes. It also updates `wasm-smith` for the component model spec changes, namely encoding core and component types in discrete sections. * wasmparser: update for the component model spec changes. This commit refactors `wasmparser` to conform to the latest component model spec. The biggest change is that core items and component items have distinct index spaces and sections. This had a large impact on the validator. Various type names have been changed to better align with the terminology used in the component model spec ("interface" is now gone in favor of component "value types"). * wasmparser-dump: update for component model changes. This commit updates `wasmparser-dump` to the latest `wasmparser` changes. * wasm-mutate: update to latest wasmparser changes. This commit updates `wasm-mutate` to the latest changes in `wasmparser`. * wasm-smith: update for wasmparser changes. This commit updates `wasm-smith` for the latest `wasmparser` changes. * wasmprinter: update for latest component model spec changes. This commit updates `wasmprinter` for both the latest `wasmparser` changes and to implement the separate index spaces for core and component items. * fuzz-targets: update fuzz targets for wasmparser changes. This commit updates the incremental-parse fuzz target to conform to the latest `wasmparser` changes. * objdump: fix the objdump command for latest `wasmparser` changes. This commit updates the `wasm-tools objdump` command for the latest `wasmparser` changes. * Add core types to component and instance types. This commit updates `wasm-encoder`, `wasmparser`, `wasmprinter`, and `wasm-smith` to support core types in component and instance type declarations. This was missing from the recent spec updates and has now been fixed in the spec. * wasm-encoder: fix incorrect section id for component alias section. This fixes a typo that resulted from refactoring the alias section into a core and component alias section. * wasm-encoder: fix missing increments when adding alias items. This commit fixes the missing increments of `num_added` for both core and component aliases. * wasmparser: fix incorrect canonical option checking. This commit fixes component validation to ensure the types referenced from the `Realloc` and `PostReturn` options are core function types. * wasmprinter: fixes to output to conform to text format. This commit fixes wasmprinter in places where it wasn't conforming properly to the text format. Additionally, it fixes a bug where instance/component types weren't registering the exports of the type and subsequent lookups of the exports would fail. * wasmparser: fix validation for post-return option. This commit fixes the validation of the post-return option so that it be present for lifting operations and that the parameters to the refer function match the results of the function being lifted. * wasmprinter: fix output for component start functions. This commit fixes the printer output for start functions to match what's expected in the text format. * wasmparser-dump: fix output to make core vs. component items clearer. * wast: update for the component model proposal changes. This commit basically rewrites the component model support in wast for the latest text format and binary format changes. `wast` now depends on `wasm-encoder` for encoding components so that there is a single implementation of the component binary format. In the future, we can also replace the core encoding implementation with using `wasm-encoder` instead (once it supports the GC proposal encoding). * Update wat tests. This commit updates the component model wat tests to the current text format. It also removes FIXMEs where appropriate and attempts to convert the remaining "fixme" holdover test files from the module linking proposal. * Fix doc comment relating to `post-return` option. Update doc comments relating to the `post-return` option to match the spec. * wasm-encoder: move component-related core sections. This commit moves the component-related "core" sections out of the core module and into the component module in `wasm-encoder`. Originally these were put in the core module in anticipation of one day having the sections be part of core, but that's a long way off and may not occur. Until that time, it makes more sense to keep these sections grouped with the other component sections. This also replaces `ComponentAliasKind` with `ComponentOuterAliasKind`, which should only allow types, modules, and components to be aliased according to the current spec. Updated `wast` based on these API changes (the other crates will have similar changes soon). * wasmparser: move component-related core sections. This commit moves the component-related core section parsing out of the `core` module and into the `component` module in `wasmparser`. Originally these were put in the core module in anticipation of one day having the sections be part of core, but that's a long way off and may not occur. Until that time, it makes sense to keep these sections grouped with the other component sections. This commit also removes the extensions to the `ExternalKind` enumeration in core. It updates alias parsing to limit outer alias kind to only the supported kinds (core module, types, and components). This helps reduce confusion between what's implemented for export aliasing and what's supported for outer aliasing. By doing so, this also enables the ability to reference outer core types in nested components. * wast: fix parsing and encoding of aliases. This commit fixes the parsing and encoding of aliases such that the grammar is restricted to what's actually supported for both instance export aliases and for outer aliases. It also addresses the inability to reference and encode outer core type aliases. `CoreItemRef` was fixed to only have a single export name supported as core instances cannot export other core instances, so there can be at most one string parsed for the export name. * wasm-smith: encode arbitrary outer aliases for core types. This commit encodes arbitrary outer aliases to core types when creating arbitrary component and instance types in the component type section. It also fixes a bug where `scalar_component_funcs` were storing indexes into `funcs` and not `component_funcs`, causing lowered functions to encode with potentially out-of-bounds component function indexes. * wasm-encoder: remove unnecessary `Export` type. This commit removes the core `Export` type, which is simply the same enumeration as `ExportKind`, but paired with the exported item's index. Instead, encoding takes both the kind and index rather than an `Export`. With this change, encoding core exports functions like component exports in `wasm-encoder`. Also fixed where `Export` was used in the other tools. * wast: add missing version for wasm-encoder reference. * wasmparser: remove `Module` from `TypeRef`. This commit removes the `Module` variant from `TypeRef` as it is no longer needed to support the component model proposal. * Replace any references to "defaults_to" with "refines". This commit updates any remaining place that used the old "defaults to" terminology for variant cases in the component model. * wast: implement identifier resolution for variant cases. This commit updates the parsing and resolution of variant cases for the `refines` clause. Variant cases may now have an optional identifier on them, which is referred to in a `refines` clause, rather than having to use the variant name string. This makes it clearer to users of the text format that a resolution is taking place. * wasmparser: move component core type reading to a separate reader. This commit moves the component core type reading from `TypeSectionReader` into a new reader called `CoreTypeSectionReader`. By doing so, we prevent the parsing of modules from supporting core types currently only supported in components. * wasm-encoder: move component core type encoding to a separate section. Similar to the previous commit, this moves component core type encoding to a `CoreTypeSection` rather than extending `TypeSection` to encode for both modules and components. This helps make the distinction between a "type section" (module) and a "core type section" (component) a little more clear until such type that modules and components can share a core type section format. * wast: allow variant case refines clause to reference by index. This commit updates the parsing of variant case refines clauses to allow them to explicitly reference another case by index rather than identifier alone.
…liance#627) The syntax for instantiation changed slightly in bytecodealliance#621 where a bare index is now required, but this syntax prevents the inline injection and usage of an alias of an export of another component/instance. This commit soups up the support here by bringing back the `IndexOrRef` structure which can be used to parse one of the two, allowing both a bare reference plus a parenthesized reference that enables alias injection.
This PR updates all of tools in
wasm-tools
that currently support the component model for the latest proposal changes.The component model proposal had a hefty change to split out core and component index spaces and, in the process, had a number of significant changes to both the text and binary formats.
An overview of the changes in this PR:
wasm-encoder
- refactored encoding of core vs. component sections that enable writing some core sections into a component, aligning the API with the latest binary format.wasmparser
- refactored reading of core vs. component sections and updated validation to account for the additional index spaces.wasm-mutate
- updated for thewasm-encoder
andwasm-parser
changes.wasm-smith
- update what's currently supported for arbitrary component encoding to the latestwasm-encoder
changes; added support for arbitrary core type sections in components.wasmparser-dump
- updated for wasmparser API changes.wasmprinter
- updated for wasmparser API changes, support for additional index spaces, and also to output the latest component model text format.wast
- overhauled the existing component model parsing support to conform to the latest component model text format and replaced component encoding to usewasm-encoder
(we can fully switch over towasm-encoder
for core encoding once it supports encoding the GC types).This PR also updates all of the component-model-related text format test files where appropriate and fixed a significant number of
FIXMEs
along the way.I realize this is a huge PR, so almost all of the commits target an individual tool at a time, except for Add core types to component and instance types, which updates the previous implementations to fix a correction to the component model spec that we discovered later and some additional (bite sized) follow-up fix commits that fixed bugs that were discovered once round-trip testing was possible again.