Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review #30

Open
6 tasks done
syg opened this issue Feb 8, 2025 · 12 comments
Open
6 tasks done

Review #30

syg opened this issue Feb 8, 2025 · 12 comments

Comments

@syg
Copy link

syg commented Feb 8, 2025

Informative

  • I plan to bring this proposal up at the Wasm CG meeting next week to get Wasm's take. I think this is orthogonal and does not step on Wasm's toes. There is an interesting future design question for reflecting WasmGC Arrays of scalars as JS TypedArrays. I imagine those reflections will not have detachable buffers, but still warrants some discussion with the CG.

Normative

  • DetachArrayBuffer needs to throw for immutable buffers.
    Validate mutability in DetachArrayBuffer #38
  • Please have a reviewed PR against HTML for structured clone. I don't think there's much design space here, so I'm personally happy for this to be a stage 3 blocker instead of a stage 2.7 blocker. But should WHATWG give feedback that requires this proposal to change, we'd need to make normative changes during 2.7.
    Add immutable array buffer awareness to structuredClone whatwg/html#11033
  • I'd love to implement this with mprotect. However, that requires page alignment. We can't reveal the page size to user code, as that's a fingerprinting vector. Fixing a page size seems too unergonomic. Ideas? (Using mprotect is a nice-to-have for V8, not a requirement.)

Editorial

@bakkot
Copy link

bakkot commented Feb 10, 2025

For structured clone, see some discussion in #19.

@erights
Copy link
Collaborator

erights commented Feb 11, 2025

@syg for our own convenience making progress on your suggestions, I've turned your bullets in check boxes. I also turned the first paragraph into a checkbox, but it sounds like something you're queueing for yourself do. Yes?

@syg
Copy link
Author

syg commented Feb 11, 2025

but it sounds like something you're queueing for yourself do. Yes?

Yep, planning to do it myself.

@erights
Copy link
Collaborator

erights commented Feb 11, 2025

On mprotect point, I agree that there's probably no reasonable way to expose its full power to JS. However, this proposal is consistent with an implementation strategy that makes use of the mmu internally, and so exposes a bit of its power. Whether that is better than nothing is not for me to judge.

My assumption: there will be many small immutable buffers, and many huge ones. Any use of the mmu is probably only worth it for the huge ones, but those are more likely to be the ones where micro performance matters.

  • An implementation could choose to allocate a huge immutable buffer on a set of pages shared with nothing else. (It would probably also page align this allocation, but that's besides my current point). Obviously, it would write protect these pages. This has two potential benefits:

    • It might avoid the need to test an additional bit during attempted writes, instead catching the page fault and turning it into the right thrown error.
    • Even if treatment of invalid writes happens without the optimization above, this write-protection is belt-and-suspenders. This is especially important when the same immutable array buffer is shared between different agents and different languages. Otherwise, a weakness in any one of these language implementations in any of these agents threatens all.
  • Even when talking between different processes, each with their own address space, for a huge enough buffer it will be cheaper O(logN) to map it into both address spaces than to copy it O(N). IOW, zero-copy even between address spaces.

Although I don't believe any of these optimization opportunities should affect whether this proposal advances, I'd love to hear from implementors whether any of these are practical. Thanks.

@gibson042
Copy link
Collaborator

@syg All concerns have been addressed (see new links above).

@erights
Copy link
Collaborator

erights commented Feb 17, 2025

Hi @ricea

At w3c/webtransport#131 (comment) @domenic suggests you may be interested in this topic: #30 (comment) above. I look forward to your comments. Thanks!

@erights erights mentioned this issue Feb 17, 2025
42 tasks
@syg
Copy link
Author

syg commented Feb 18, 2025

There were no concerns brought up at the Wasm CG meeting last week. We still believe this proposal to be orthogonal to what Wasm wants to do with memory protection.

@syg
Copy link
Author

syg commented Feb 18, 2025

On mprotect point, I agree that there's probably no reasonable way to expose its full power to JS. However, this proposal is consistent with an implementation strategy that makes use of the mmu internally, and so exposes a bit of its power. Whether that is better than nothing is not for me to judge.

My assumption: there will be many small immutable buffers, and many huge ones. Any use of the mmu is probably only worth it for the huge ones, but those are more likely to be the ones where micro performance matters.

Ballpark, what byte lengths do you consider small and what do you consider huge?

I do agree there, as if "small" is smaller than a typical 4K page, the only way to get OS-level protection is to reserve a whole page, which probably isn't worth it.

An implementation could choose to allocate a huge immutable buffer on a set of pages shared with nothing else. (It would probably also page align this allocation, but that's besides my current point). Obviously, it would write protect these pages. This has two potential benefits:

It might avoid the need to test an additional bit during attempted writes, instead catching the page fault and turning it into the right thrown error.

This is possible and is already an implementation technique for Wasm. But it's definitely a scary corner -- anything with fault handlers is. Wasm also has less... stuff... going on than user JS code, so I'd want some careful scrutiny because recommending this as an implementation strategy.

In other words, even if immutable buffers were mprotected, I'd probably still try to have a manual check on writes and not rely on just a fault handler to throw JS errors.

Even if treatment of invalid writes happens without the optimization above, this write-protection is belt-and-suspenders. This is especially important when the same immutable array buffer is shared between different agents and different languages. Otherwise, a weakness in any one of these language implementations in any of these agents threatens all.
Even when talking between different processes, each with their own address space, for a huge enough buffer it will be cheaper O(logN) to map it into both address spaces than to copy it O(N). IOW, zero-copy even between address spaces.

Yeah, implementing this with OS-level page protection won't be a requirement for actually sharing the memory among threads. If a non-page protection implementation (e.g. a manual check on writes) is buggy, we believe that to only be a correctness issue from the engine's POV. Whatever other protections we've put into place to mitigate threats for mutable ArrayBuffers also applies here.

Although I don't believe any of these optimization opportunities should affect whether this proposal advances, I'd love to hear from implementors whether any of these are practical. Thanks.

Thanks for the discussion. Agreed that the implementation strategies here don't block advancement to 2.7. My current hunch is that it's probably worth investing in an mprotect path for huge buffers, that either happen to already be page-aligned, or would incur a relatively small cost to round up the reservation size to the next page multiple, as a transparent optimization.

@erights
Copy link
Collaborator

erights commented Feb 18, 2025

Ballpark, what byte lengths do you consider small and what do you consider huge?

Ballpark, small is < 1MB. Huge is >= 1GB

But of course this is for implementors to determine based on measurement, as these distinctions have no observable effect.

In other words, even if immutable buffers were mprotected, I'd probably still try to have a manual check on writes and not rely on just a fault handler to throw JS errors.

Good. Belt and suspenders

Yeah, implementing this with OS-level page protection won't be a requirement for actually sharing the memory among threads. If a non-page protection implementation (e.g. a manual check on writes) is buggy, we believe that to only be a correctness issue from the engine's POV. Whatever other protections we've put into place to mitigate threats for mutable ArrayBuffers also applies here.

Within one process/address-space, that makes sense even in a multi-language environment. A weakness of memory safety of any threatens all, independent of arrayBuffers.

But what about between processes, that are otherwise protected except for the Shared ArrayBuffers and the Immutable ArrayBuffers, if these are shared by mapping them into the participating processes?

Although I don't believe any of these optimization opportunities should affect whether this proposal advances, I'd love to hear from implementors whether any of these are practical. Thanks.

Us too! Thanks.

Thanks for the discussion. Agreed that the implementation strategies here don't block advancement to 2.7. My current hunch is that it's probably worth investing in an mprotect path for huge buffers, that either happen to already be page-aligned, or would incur a relatively small cost to round up the reservation size to the next page multiple, as a transparent optimization.

Good. Given that, is it appropriate to close this issue? (What really matters is the checkboxes on the Path to Stage 4 page, which I see is already checked, so no prob either way.)

Thanks!

@ricea
Copy link

ricea commented Feb 19, 2025

Even when talking between different processes, each with their own address space, for a huge enough buffer it will be cheaper O(logN) to map it into both address spaces than to copy it O(N). IOW, zero-copy even between address spaces.

The trouble is that when the buffer is created we don't know whether it will need to be shared between address spaces or not, and using shared memory costs more.

I did some benchmarking on Linux, and creating a shared memory mapping takes roughly ten times as long as an anonymous memory mapping. What's worse, the soft page faults on the shared memory mapping also take 10 times as long. So there's an O(N) cost for first use of the mapping. Example: 120ms to create and memset() a 1GB anonymous mapping; 1.25s to do the same thing with a shared mapping.

I think the cost is prohibitive for engines to create huge mappings in shared memory just in case we will need to share them with a different address space later.

@erights
Copy link
Collaborator

erights commented Feb 24, 2025

Hi @ricea , thanks for the analysis!

What if the implementation did not do this for the first creation, but instead on the first time it were passed between address spaces? Once it is passed between address spaces once, that might be a better predictor that it will be again.

@ricea
Copy link

ricea commented Feb 24, 2025

What if the implementation did not do this for the first creation, but instead on the first time it were passed between address spaces? Once it is passed between address spaces once, that might be a better predictor that it will be again.

Yes, that is something that implementations could explore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants