Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The current state of the Wasm port #2819

Open
bakaq opened this issue Feb 8, 2025 · 3 comments
Open

The current state of the Wasm port #2819

bakaq opened this issue Feb 8, 2025 · 3 comments

Comments

@bakaq
Copy link
Contributor

bakaq commented Feb 8, 2025

Recently I have been trying to compile the current master of Scryer Prolog to Wasm (specifically the browser version, not WASI yet) to improve the situation there. I faced many hurdles which I will list here so that we are aware of what needs to be improved.

  • Compiling with optimizations (even opt-level=1) takes something like 30GB+ of RAM in my computer (it never completed, and I suspect I would need a lot of swap to make it work). I suspect that the only way to improve the situation here is to split great parts of Scryer Prolog into different crates, because the codegen unit of Rust is a crate and Scryer Prolog is just too big. Obvious candidates are stuff like the parser, the compiler, the rcu, the arena, etc... That would also have many other benefits.
  • Compiling without optimizations works normally. However, because Machine::dispatch is insanely big it actually produces an invalid Wasm artifact. Anything that tries to use it, including wasm-bindgen to create the Javascript bindings, errors because the generated function has too many locals. This is a known problem, and there are people trying to improve the situation here in both LLVM and the Wasm spec. LLVM (which is what Rust uses to compile) ideally should detect that it is trying to make an invalid artifact and break the function into smaller pieces if necessary, and the Wasm spec seems to be considering increasing the locals limit. The solution here is to either wait until the rest of the ecosystem improves, or to break Machine::dispatch into smaller functions somehow.
    • I have found a workaround though. One can use wasm-opt with -O1 flag to get some basic optimizations on the Wasm artifact, and that seems to reduce the local count enough for it to be valid. wasm-pack (which is what one would normally use to compile a Rust project for web) applies wasm-opt, but only after wasm-bindgen, so we can't use it because then wasm-bindgen gets the invalid artifact and errors. We need to manually compile, wasm-opt, and then wasm-bindgen.
  • After doing all that just to get a valid artifact, it can't actually run because it imports some functions from a module called env for some reason. This gets bindgen-ed to a import thing from "env" in Javascript and makes the bindings unusable because that is not a valid import in the browser. I discovered that this is the fault of the ring dependency, because even using its "Wasm in the browser" feature it accidentally links 2 C functions that don't exist in the Wasm build (this wasn't a problem in their 0.16, which is why this wasn't a problem in Scryer before we updated to 0.17, and also isn't caught by their CI for some reason). I was able to patch ring to fix this and finally get a working artifact, I will report this and the fix to them upstream. The fix here is to either put all ring dependent functionality behind a feature, downgrade ring, vendor a ring fork with the patch as a dependency, or wait for ring to release a new version with the fix (which I believe will take quite some time judging by their previous release schedule). EDIT: This is actually just my Nix setup being weird by default. Installing clang solves this.

TL;DR

Usually to build Wasm Scryer we would just need to run wasm-pack --target web --no-default-features, but in current master we actually need to compile in debug mode, wasm-opt -O1, and then wasm-bindgen, all manually.

@bakaq
Copy link
Contributor Author

bakaq commented Feb 9, 2025

Ok, update: I was able to create a Cargo profile that can go even to opt-level=3 and still fit in 14GB max RAM usage (which is doable in my current machine:

[profile.wasm-release]
inherits = "release"

# Implicit because of the above inherit, but can be overriden
# by the CARGO_PROFILE_WASM_RELEASE_OPT_LEVEL env var to
# opt-level = 1 to get a more debug-like compile time while
# still optimizing enough to avoid creating an invalid
# artifact with too much locals.
#opt-level = 3 

# Currently true in release, which is extremely expensive,
# and false in debug, which is also surprisingly expensive.
lto = "off"

# Creates smaller artifact at the cost of backtraces, which
# sounds like a good tradeoff for the web
panic = "abort"

# Splits the crate into small pieces (default on release is 16).
# With lto="off" this doesn't do link time optimization between
# the results, which keeps a lot of performance on the table,
# but seems to help a lot with max RAM usage.
codegen-units = 256

We can then use this profile in wasm-pack as follows:

# The --dev is necessary to allow setting a profile.
wasm-pack --target=web --dev --profile=wasm-release --no-default-features

With wasm-opt -Oz and gzip the total wire transfer size we get is 1.3MB, which is actually really impressive given how much functionality is packed in there.

This is enough for me to get working on improving the Wasm interface, but I will keep this issue open because the problems I pointed out are still valid.

@adri326
Copy link
Contributor

adri326 commented Feb 9, 2025

I agree with the sentiment of wanting to split the project up into smaller crates; right now doing any change triggers the main build script to run again (to gather the list of atoms used in atom!()), which then triggers the whole project to be recompiled. It's not as bad on my desktop, but on my laptop I have to wait a good 30 seconds for the build to succeed (hot release builds take a good 7 minutes).

Before we can do any of that, we would need to change the current system of gathering atom!() calls, so that it can work across codegen units. I would like to make it so that instead of scanning the entire code at compile time, we would scan the code during development, generate a builtin_atoms.txt file (sorted alphabetically so that it behaves well with git), and have the build script for generating the actual atom!() macro (placed in a separate crate) depend on builtin_atoms.txt alone. Merge conflicts can be automatically resolved by re-generating the file (since it's fully deterministic) and we can have a test in CI that verifies that this file is always up-to-date.

@jjtolton
Copy link

jjtolton commented Feb 9, 2025

Hell yeah, @bakaq!! This is going to be HUGE!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants