Skip to content

[SH] add userfault support #5261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: feature/secret-hiding
Choose a base branch
from

Conversation

kalyazin
Copy link
Contributor

@kalyazin kalyazin commented Jun 13, 2025

Changes

Implement userfault support in Secret Freedom. The goal of this change is to be able to resume Secret-Free VMs via UFFD.

Major changes:

  • Firecracker sends guest_memfd and memfd to the UFFD handler. UFFD handler writes to the guest_memfd to populate guest pages and clears bits in the userfault bitmap (memfd) to stop KVM from sending vCPU fault notifications
  • vCPU faults on guest_memfd cause VM exits. Once vCPU exits to userspace on a fault, it sends a fault request to the VMM thread via a pipe for the VMM thread to forward it to the UFFD handler.
  • Firecracker- and KVM-triggered faults are delivered to the UFFD handler via minor UFFD notifications and UFFD handler unblocks the faulting process via UFFDIO_CONTINUE.

Reason

This is needed to be able to restore snapshots where the VM was backed by guest_memfd.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • [ ] I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • [ ] I have mentioned all user-facing changes in CHANGELOG.md.
  • [ ] If a specific issue led to this PR, this PR closes the issue.
  • [ ] When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • [ ] I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • [ ] I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented Jun 13, 2025

Codecov Report

Attention: Patch coverage is 28.48485% with 354 lines in your changes missing coverage. Please review.

Project coverage is 81.66%. Comparing base (00ac2f3) to head (e947a92).

Files with missing lines Patch % Lines
src/vmm/src/lib.rs 12.83% 129 Missing ⚠️
src/vmm/src/vstate/vm.rs 41.30% 81 Missing ⚠️
src/vmm/src/persist.rs 21.79% 61 Missing ⚠️
src/vmm/src/builder.rs 47.05% 36 Missing ⚠️
src/vmm/src/vstate/vcpu.rs 35.55% 29 Missing ⚠️
src/vmm/src/vstate/memory.rs 0.00% 18 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                    @@
##           feature/secret-hiding    #5261      +/-   ##
=========================================================
- Coverage                  82.52%   81.66%   -0.86%     
=========================================================
  Files                        250      250              
  Lines                      27386    27795     +409     
=========================================================
+ Hits                       22599    22698      +99     
- Misses                      4787     5097     +310     
Flag Coverage Δ
5.10-c5n.metal 81.85% <23.63%> (-1.06%) ⬇️
5.10-m5n.metal 81.85% <23.63%> (-1.05%) ⬇️
5.10-m6a.metal 81.02% <23.63%> (-1.08%) ⬇️
5.10-m6g.metal 77.62% <20.12%> (-1.08%) ⬇️
5.10-m6i.metal 81.85% <23.63%> (-1.05%) ⬇️
5.10-m7a.metal-48xl 81.00% <23.63%> (-1.09%) ⬇️
5.10-m7g.metal 77.62% <20.12%> (-1.08%) ⬇️
5.10-m7i.metal-24xl 81.81% <23.63%> (-1.05%) ⬇️
5.10-m7i.metal-48xl 81.81% <23.63%> (-1.05%) ⬇️
5.10-m8g.metal-24xl 77.61% <20.12%> (-1.08%) ⬇️
5.10-m8g.metal-48xl 77.61% <20.12%> (-1.08%) ⬇️
6.1-c5n.metal 81.89% <23.63%> (-1.06%) ⬇️
6.1-m5n.metal 81.90% <23.63%> (-1.06%) ⬇️
6.1-m6a.metal 81.06% <23.63%> (-1.09%) ⬇️
6.1-m6g.metal 77.62% <20.12%> (-1.08%) ⬇️
6.1-m6i.metal 81.89% <23.63%> (-1.06%) ⬇️
6.1-m7a.metal-48xl 81.05% <23.63%> (-1.08%) ⬇️
6.1-m7g.metal 77.62% <20.12%> (-1.08%) ⬇️
6.1-m7i.metal-24xl 81.91% <23.63%> (-1.06%) ⬇️
6.1-m7i.metal-48xl 81.90% <23.63%> (-1.06%) ⬇️
6.1-m8g.metal-24xl 77.61% <20.12%> (-1.08%) ⬇️
6.1-m8g.metal-48xl 77.61% <20.12%> (-1.08%) ⬇️
6.14-c5n.metal 81.95% <28.48%> (-0.96%) ⬇️
6.14-m5n.metal 81.95% <28.48%> (-0.98%) ⬇️
6.14-m6a.metal 81.12% <28.48%> (-1.00%) ⬇️
6.14-m6g.metal 77.66% <25.00%> (-1.00%) ⬇️
6.14-m6i.metal 81.95% <28.48%> (-0.97%) ⬇️
6.14-m7a.metal-48xl 81.10% <28.48%> (-1.00%) ⬇️
6.14-m7g.metal 77.66% <25.00%> (-1.00%) ⬇️
6.14-m7i.metal-24xl 81.96% <28.48%> (-0.97%) ⬇️
6.14-m7i.metal-48xl 81.96% <28.48%> (-0.97%) ⬇️
6.14-m8g.metal-24xl 77.66% <25.00%> (-1.00%) ⬇️
6.14-m8g.metal-48xl 77.66% <25.00%> (-1.00%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kalyazin kalyazin force-pushed the sh_uf branch 2 times, most recently from 286efbe to 4e10e54 Compare June 13, 2025 19:55
@kalyazin kalyazin marked this pull request as ready for review June 16, 2025 07:26
@kalyazin kalyazin changed the title [WIP][SH] add userfault support to UFFD handlers [SH] add userfault support to UFFD handlers Jun 16, 2025
@kalyazin kalyazin force-pushed the sh_uf branch 3 times, most recently from b6185cb to 60abeb9 Compare June 17, 2025 10:42
@kalyazin kalyazin force-pushed the sh_uf branch 4 times, most recently from d5e7aa8 to 40101cd Compare June 19, 2025 11:41
@kalyazin kalyazin mentioned this pull request Jun 19, 2025
10 tasks
@kalyazin kalyazin changed the title [SH] add userfault support to UFFD handlers [SH] add userfault support Jun 19, 2025
@kalyazin kalyazin self-assigned this Jun 19, 2025
JackThomson2
JackThomson2 previously approved these changes Jun 19, 2025
kalyazin added 12 commits June 20, 2025 17:19
This is needed because if guest_memfd is used to back guest memory, vCPU
fault notifications are delivered via the UFFD UDS socket.

Signed-off-by: Nikita Kalyazin <[email protected]>
Example UFFD handlers are now reading from the UDS socket in a buffered
way.  This is to make it possible to read messages of different types in
future commits to be able to handle fault request messages from
Firecracker if Secret Freedom is enabled.

Signed-off-by: Nikita Kalyazin <[email protected]>
It is used by Secret-Free-enabled UFFD handlers to disable vCPU fault
notifications from the kernel.

Signed-off-by: Nikita Kalyazin <[email protected]>
Accept receiving 3 fds instead of 1, where fds[1] is guest_memfd and
fds[2] is userfault bitmap memfd.

Also handle the FaultRequest message over the UDS socket by calling a
new callback in the Runtime and sending a FaultReply.

TODO: add cab/sob from Patrick

Signed-off-by: Nikita Kalyazin <[email protected]>
There are two ways a UFFD handler receives a fault notification if
Secret Fredom is enabled (which is inferred from 3 fds sent by
Firecracker instead of 1):
 - a VMM- or KVM-triggered fault is delivered via a minor UFFD fault
   event.  The handler is supposed to respond to it via memcpying the
   content of the page (if the page hasn't already been populated)
   followed by a UFFDIO_CONTINUE call.
 - a vCPU-triggered fault is delievered via a FaultRequest message on
   the UDS socket.  The handler is supposed to reply with a pwrite64
   call on the guest_memfd to populate the page followed by a FaultReply
   message on the UDS socket.

In both cases, the handler also needs to clear the bit in the userfault
bitmap at the corresponding offset in order to stop further fault
notifications for the same page.

UFFD handlers use the userfault bitmap for two purposes:
 - communicate to the kernel whether a fault at the corresponding
   guest_memfd offset will cause a VM exit
 - keep track of pages that have already been populated in order to
   avoid overwriting the content of the page that is already
   initialised.

Signed-off-by: Nikita Kalyazin <[email protected]>
These are used for communication of page faults between Firecracker and
a UFFD handler.

Signed-off-by: Nikita Kalyazin <[email protected]>
If configured, userfault bitmap is registered with KVM and controls
whether KVM will exit to userspace on a fault of the corresponding page.

We are going to allocate the bitmap in a memfd in Firecracker, set bits
for all pages to request notifications for vCPU faults and send
it to the UFFD handler to delegate clearing the bits as pages get
populated.

Since the KVM userfault patches are still in review,
set_user_memory_region2 is not aware of the userfault flag and the
userfault bitmap address in its input structure.  Define it in
Firecracker code temporarily.

Signed-off-by: Nikita Kalyazin <[email protected]>
This is needed to instruct the kernel to exit to userspace when a vCPU
fault occurs and the corresponding bit in the userfault bitmap is set.

The userfault bitmap is allocated in a memfd by Firecracker and sent to
the UFFD handler.

This also sends 3 fds to the UFFD handler in the handshake:
 - UFFD (original)
 - guest_memfd: for the handler to be able to populate guest memory
 - userfault bitmap memfd: for the handler to be able to disable exits
   to userspace for the pages that have already been populated

Signed-off-by: Nikita Kalyazin <[email protected]>
These will be used to communicate vCPU faults between vCPUs and the VM
if secret freedom is enabled.

Signed-off-by: Nikita Kalyazin <[email protected]>
This is because vCPUs reason in GPAs while the secret-free UFFD
protocol is guest_memfd-offset-based.

TODO: add cab/sob from Patrick

Signed-off-by: Nikita Kalyazin <[email protected]>
It contains two parts:
 - external: between the VMM thread and the UFFD handler
 - internal: between vCPUs and the VMM thread

An outline of the workflow:
 - When a vCPU fault occurs, vCPU exits to userspace
 - The vCPU thread sends a message to the VMM thread via the userfault
   channel
 - The VMM thread forwards the message to the UFFD handler via the UDS
   socket
 - The UFFD hnadler populates the page, clears the corresponding bit in
   the userfault bitmap and sends a reply to Firecracker
 - The VMM thread receives the reply and forwards it to the vCPU via the
   userfault channel
 - The vCPU resumes execution

Signed-off-by: Nikita Kalyazin <[email protected]>
This is required by Secret Freedom to implement the userfault protocol:
vCPUs read notification of fault handling completions from the userfault
channel.

Signed-off-by: Nikita Kalyazin <[email protected]>
kalyazin added 7 commits June 20, 2025 17:24
kvmclock is currently not supported by Secret Freedom and calling
kvmclock_ctrl will always fail.

Signed-off-by: Nikita Kalyazin <[email protected]>
In a regular VM, we mmap the memory snapshot file and supply the address
in the KVM memory slot.  In Secret Free VMs, we provide guest_memfd in
the memory slot instead.  There is no way we can restore a Secret Free
VM from a file, unless we prepopulate the guest_memfd with the file
content, which is inefficient and is not practically useful.

Signed-off-by: Nikita Kalyazin <[email protected]>
It is not supported by Secret Freedom.

Signed-off-by: Nikita Kalyazin <[email protected]>
This includes both functional and performance tests.

Signed-off-by: Nikita Kalyazin <[email protected]>
Do not add a balloon device to a Secret Free VM as it is not currently
supported.

Signed-off-by: Nikita Kalyazin <[email protected]>
When taking a snapshot from a Secret Free VM, we create a bounce buffer
to be able to pass it to the host kernel to store in a file.  Exclude it
from the memory monitor calculation.

Signed-off-by: Nikita Kalyazin <[email protected]>
This is because the error type has changed due the implementation of
snapshot restore support for Secret Free VMs.

Signed-off-by: Nikita Kalyazin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants