Skip to content

Conversation

ShadowCurse
Copy link
Contributor

Changes

Add virtio-pmem device support.

Closes #5448

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse self-assigned this Oct 3, 2025
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 73.86364% with 138 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.63%. Comparing base (313468e) to head (5bac831).

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/pmem/device.rs 68.24% 67 Missing ⚠️
src/vmm/src/devices/virtio/pmem/event_handler.rs 23.07% 40 Missing ⚠️
src/vmm/src/builder.rs 77.14% 8 Missing ⚠️
src/vmm/src/rpc_interface.rs 0.00% 8 Missing ⚠️
src/firecracker/src/api_server/request/pmem.rs 77.27% 5 Missing ⚠️
src/vmm/src/devices/virtio/pmem/metrics.rs 85.29% 5 Missing ⚠️
src/vmm/src/vmm_config/pmem.rs 93.02% 3 Missing ⚠️
src/vmm/src/device_manager/persist.rs 96.55% 1 Missing ⚠️
src/vmm/src/devices/virtio/pmem/persist.rs 95.23% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5463      +/-   ##
==========================================
- Coverage   82.79%   82.63%   -0.16%     
==========================================
  Files         263      269       +6     
  Lines       27223    27736     +513     
==========================================
+ Hits        22538    22919     +381     
- Misses       4685     4817     +132     
Flag Coverage Δ
5.10-m5n.metal 82.78% <73.86%> (-0.20%) ⬇️
5.10-m6a.metal 82.03% <73.86%> (-0.19%) ⬇️
5.10-m6g.metal 79.44% <73.86%> (-0.13%) ⬇️
5.10-m6i.metal 82.78% <73.86%> (-0.20%) ⬇️
5.10-m7a.metal-48xl 82.02% <73.86%> (-0.19%) ⬇️
5.10-m7g.metal 79.44% <73.86%> (-0.13%) ⬇️
5.10-m7i.metal-24xl 82.74% <73.86%> (-0.21%) ⬇️
5.10-m7i.metal-48xl 82.74% <73.86%> (-0.20%) ⬇️
5.10-m8g.metal-24xl 79.45% <73.86%> (-0.12%) ⬇️
5.10-m8g.metal-48xl 79.44% <73.86%> (-0.13%) ⬇️
6.1-m5n.metal 82.80% <73.86%> (-0.20%) ⬇️
6.1-m6a.metal 82.06% <73.86%> (-0.19%) ⬇️
6.1-m6g.metal 79.45% <73.86%> (-0.12%) ⬇️
6.1-m6i.metal 82.80% <73.86%> (-0.20%) ⬇️
6.1-m7a.metal-48xl 82.05% <73.86%> (-0.19%) ⬇️
6.1-m7g.metal 79.44% <73.86%> (-0.13%) ⬇️
6.1-m7i.metal-24xl 82.81% <73.86%> (-0.20%) ⬇️
6.1-m7i.metal-48xl 82.81% <73.86%> (-0.21%) ⬇️
6.1-m8g.metal-24xl 79.45% <73.86%> (-0.13%) ⬇️
6.1-m8g.metal-48xl 79.45% <73.86%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShadowCurse ShadowCurse force-pushed the virtio_pmem branch 2 times, most recently from d8f9547 to 5970613 Compare October 6, 2025 11:01
@ShadowCurse ShadowCurse marked this pull request as ready for review October 6, 2025 12:57
@ShadowCurse ShadowCurse added Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests labels Oct 6, 2025
@ShadowCurse ShadowCurse force-pushed the virtio_pmem branch 2 times, most recently from a8bedbb to 1d2aeb2 Compare October 6, 2025 14:18
Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • we should update docs/device-api.md.
  • changelog entry
  • any performance tests? we could check how fast we can read or write the entire pmem or maybe we can integrate it with the block tests using fio

Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went half way through. Here's an initial set of comments


/// Adds an existing pmem device in the builder.
pub fn add_device(&mut self, device: Arc<Mutex<Pmem>>) {
self.devices.push(device);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we add this to the corresponding index?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, could you also add a unit test for this one as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test for what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a unit test which ensures that add_device does what you think it's doing. But back to my initial question, shouldn't add_device add device in the correct place in self.devices, according to the device index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a unit test which ensures that add_device does what you think it's doing.

is a single line: self.devices.push(device); so elusive to need a unit test?

order of deivces onlt matters during VM boot if any of them is a root device. Otherwise order is not important. The add_device only used during snapshot restore and even in that case the order is preserved since configs for devices are stored in the same order as they are during VM boot (with configs function)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order of deivces onlt matters during VM boot if any of them is a root device. Otherwise order is not important.

That is immaterial since we don't know in advance whether there's a pmem root device.

The add_device only used during snapshot restore and even in that case the order is preserved since configs for devices are stored in the same order as they are during VM boot (with configs function)

That explains it, thanks. At the very least could you add a comment explaining that, and an assertion that this (that add_used is called with in-order devices) is the case.

is a single line: self.devices.push(device); so elusive to need a unit test?

That single line is carrying the following assumptions/ambiguities:

  1. There is/isn't a pmem root device
  2. Whoever is calling it calls it with devices that are in order

It certainly isn't a matter of how many lines of code this is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and an assertion that this (that add_used is called with in-order devices) is the case.

Can you explain a bit more?

Added a comment about this.

@ShadowCurse ShadowCurse force-pushed the virtio_pmem branch 10 times, most recently from 7eff29f to 4a19190 Compare October 8, 2025 12:19
@ShadowCurse ShadowCurse force-pushed the virtio_pmem branch 2 times, most recently from de6031e to d8c695a Compare October 8, 2025 16:36
Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. We're almost there.

One thought that I had while reading the snapshot documentation. Does it make sense to add already an (optional) override in LoadSnapshotConfig similarly to what we've done for network devices?

Comment on lines +43 to +65
# Needed for DAX on aarch64. Will be ignored on x86_64
CONFIG_ARM64_PMEM=y
CONFIG_DEVICE_MIGRATION=y
CONFIG_ZONE_DEVICE=y
CONFIG_VIRTIO_PMEM=y
CONFIG_LIBNVDIMM=y
CONFIG_BLK_DEV_PMEM=y
CONFIG_ND_CLAIM=y
CONFIG_ND_BTT=y
CONFIG_BTT=y
CONFIG_ND_PFN=y
CONFIG_NVDIMM_PFN=y
CONFIG_NVDIMM_DAX=y
CONFIG_OF_PMEM=y
CONFIG_NVDIMM_KEYS=y
CONFIG_DAX=y
CONFIG_DEV_DAX=y
CONFIG_DEV_DAX_PMEM=y
CONFIG_DEV_DAX_KMEM=y
CONFIG_FS_DAX=y
CONFIG_FS_DAX_PMD=y
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention these in kernel-policy.md. Maybe put a link here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a link into kernel-policy that points to this section (since there are a lot of configs).

@bchalios
Copy link
Contributor

bchalios commented Oct 9, 2025

I took a look at the codecov report. It looks like that there are a few things we're missing that we could (meaningfully) cover:

Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm, just a few nits about the documentation

pub struct PmemState {
pub virtio_state: VirtioDeviceState,
pub config_space: ConfigSpace,
pub config: PmemConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do anything about the backing file to ensure writes to it are visible to other processes? For example, we need to ensure that the process taking the snapshot can read the correct contents.

Are we doing anything special for the existing block devices to flush them to disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we do not flush block devices (block and pmem) on snapshot creation. If VM is killed right after snapshot is created, pmem will be synced by the kernel.

Comment on lines +197 to +204
In the case where multiple VMs have `virtio-pmem` devices that point to the same
underlying file the memory overhead can be amortized since total maximum memory
usage will only include a single instance of `virtio-pmem` memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not mention this as we do not recommend this usecase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is valid to leave this explanation even if we don't recomment this since it is still a valid usage of pmem.

docs/pmem.md Outdated
Comment on lines 187 to 191
Since `virtio-pmem` resides in host memory it does increase the maximum possible
memory usage of a VM since now VM can use all of its RAM and access all of the
`virtio-pmem` memory. In order to minimize the overhead, it is highly
recommended to use `DAX` mode to avoid unnecessary duplication of data in guest
page cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make clear that the resident memory used to back the virito-pmem does not count towards the VM memory limit, but that can be reclaimed (paged out) by the host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a section about reclamation of memory.
What do you mean by does not count towards the VM memory limit?

@ShadowCurse ShadowCurse force-pushed the virtio_pmem branch 4 times, most recently from e05ec14 to 948eb1d Compare October 9, 2025 15:42
msync is used by virtio-pmem device to trigger sync of mmaped
file content to the underlying file.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add implementations of device, event handling, metrics.
Add device config and builder types for API use.

Signed-off-by: Egor Lazarchuk <[email protected]>
@ShadowCurse ShadowCurse force-pushed the virtio_pmem branch 6 times, most recently from 3cdd952 to ce09ce5 Compare October 10, 2025 15:54
Update VmResources type with virtio-pmem configuration
field to allow virtio-pmem devices be configured
through config files and later through API calls.

Signed-off-by: Egor Lazarchuk <[email protected]>
Both virtio-block and virtio-pmem can act as root devices
for a VM. Add a check to prevent specifing more than 1 root
device for a VM.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add /pmem/id PUT request for virtio-pmem configuration.
Add corresponding metrics.

Signed-off-by: Egor Lazarchuk <[email protected]>
Virtio-pmem devices need to allocate a memory region in guest physical
memory. The safe place to do this is past 64bit MMIO region.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add a counter for KVM slot ids into VmCommon struct. This is done
because virtio-pmem device needs to obtain it's KVM slot id
independently from number of slots in GuestMemoryMmap.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add methods to attach virtio-pmem devices to Vmm.
Add methods to create KVM memory slot for virtio-pmem devices.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add logic to store and restore virtio-pmem device information
in a snapshot.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add functional and API tests for virtio-pmem device
and its configuration fields

Signed-off-by: Egor Lazarchuk <[email protected]>
Expose virtio-pmem metrics in the logger, so they are exported in
metrics.json.
Update integration tests to expect new metrics.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add description of pmem APIs in swagger file and
device-api.md

Signed-off-by: Egor Lazarchuk <[email protected]>
Add new document about virtio-pmem configuration and usage.

Signed-off-by: Egor Lazarchuk <[email protected]>
Add a note about addition of virtio-pmem device.

Signed-off-by: Egor Lazarchuk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[virtio-pmem] Add support for virtio-pmem device

3 participants