-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[virtio-pmem] Implementation #5463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5463 +/- ##
==========================================
- Coverage 82.79% 82.63% -0.16%
==========================================
Files 263 269 +6
Lines 27223 27736 +513
==========================================
+ Hits 22538 22919 +381
- Misses 4685 4817 +132
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d8f9547
to
5970613
Compare
a8bedbb
to
1d2aeb2
Compare
1d2aeb2
to
7d83503
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we should update
docs/device-api.md
. - changelog entry
- any performance tests? we could check how fast we can read or write the entire pmem or maybe we can integrate it with the block tests using fio
7d83503
to
efd93ea
Compare
9a554b4
to
4b4779b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went half way through. Here's an initial set of comments
|
||
/// Adds an existing pmem device in the builder. | ||
pub fn add_device(&mut self, device: Arc<Mutex<Pmem>>) { | ||
self.devices.push(device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we add this to the corresponding index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, could you also add a unit test for this one as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit test for what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a unit test which ensures that add_device
does what you think it's doing. But back to my initial question, shouldn't add_device
add device
in the correct place in self.devices
, according to the device index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a unit test which ensures that add_device does what you think it's doing.
is a single line: self.devices.push(device);
so elusive to need a unit test?
order of deivces onlt matters during VM boot if any of them is a root device. Otherwise order is not important. The add_device
only used during snapshot restore and even in that case the order is preserved since configs for devices are stored in the same order as they are during VM boot (with configs
function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order of deivces onlt matters during VM boot if any of them is a root device. Otherwise order is not important.
That is immaterial since we don't know in advance whether there's a pmem root device.
The add_device only used during snapshot restore and even in that case the order is preserved since configs for devices are stored in the same order as they are during VM boot (with configs function)
That explains it, thanks. At the very least could you add a comment explaining that, and an assertion that this (that add_used
is called with in-order devices) is the case.
is a single line: self.devices.push(device); so elusive to need a unit test?
That single line is carrying the following assumptions/ambiguities:
- There is/isn't a pmem root device
- Whoever is calling it calls it with devices that are in order
It certainly isn't a matter of how many lines of code this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and an assertion that this (that add_used is called with in-order devices) is the case.
Can you explain a bit more?
Added a comment about this.
7eff29f
to
4a19190
Compare
de6031e
to
d8c695a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments. We're almost there.
One thought that I had while reading the snapshot documentation. Does it make sense to add already an (optional) override in LoadSnapshotConfig
similarly to what we've done for network devices?
# Needed for DAX on aarch64. Will be ignored on x86_64 | ||
CONFIG_ARM64_PMEM=y | ||
CONFIG_DEVICE_MIGRATION=y | ||
CONFIG_ZONE_DEVICE=y | ||
CONFIG_VIRTIO_PMEM=y | ||
CONFIG_LIBNVDIMM=y | ||
CONFIG_BLK_DEV_PMEM=y | ||
CONFIG_ND_CLAIM=y | ||
CONFIG_ND_BTT=y | ||
CONFIG_BTT=y | ||
CONFIG_ND_PFN=y | ||
CONFIG_NVDIMM_PFN=y | ||
CONFIG_NVDIMM_DAX=y | ||
CONFIG_OF_PMEM=y | ||
CONFIG_NVDIMM_KEYS=y | ||
CONFIG_DAX=y | ||
CONFIG_DEV_DAX=y | ||
CONFIG_DEV_DAX_PMEM=y | ||
CONFIG_DEV_DAX_KMEM=y | ||
CONFIG_FS_DAX=y | ||
CONFIG_FS_DAX_PMD=y | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention these in kernel-policy.md
. Maybe put a link here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a link into kernel-policy
that points to this section (since there are a lot of configs).
I took a look at the codecov report. It looks like that there are a few things we're missing that we could (meaningfully) cover:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lgtm, just a few nits about the documentation
pub struct PmemState { | ||
pub virtio_state: VirtioDeviceState, | ||
pub config_space: ConfigSpace, | ||
pub config: PmemConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do anything about the backing file to ensure writes to it are visible to other processes? For example, we need to ensure that the process taking the snapshot can read the correct contents.
Are we doing anything special for the existing block devices to flush them to disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we do not flush block devices (block and pmem) on snapshot creation. If VM is killed right after snapshot is created, pmem will be synced by the kernel.
In the case where multiple VMs have `virtio-pmem` devices that point to the same | ||
underlying file the memory overhead can be amortized since total maximum memory | ||
usage will only include a single instance of `virtio-pmem` memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not mention this as we do not recommend this usecase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is valid to leave this explanation even if we don't recomment this since it is still a valid usage of pmem.
docs/pmem.md
Outdated
Since `virtio-pmem` resides in host memory it does increase the maximum possible | ||
memory usage of a VM since now VM can use all of its RAM and access all of the | ||
`virtio-pmem` memory. In order to minimize the overhead, it is highly | ||
recommended to use `DAX` mode to avoid unnecessary duplication of data in guest | ||
page cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make clear that the resident memory used to back the virito-pmem
does not count towards the VM memory limit, but that can be reclaimed (paged out) by the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a section about reclamation of memory.
What do you mean by does not count towards the VM memory limit
?
e05ec14
to
948eb1d
Compare
msync is used by virtio-pmem device to trigger sync of mmaped file content to the underlying file. Signed-off-by: Egor Lazarchuk <[email protected]>
Add implementations of device, event handling, metrics. Add device config and builder types for API use. Signed-off-by: Egor Lazarchuk <[email protected]>
3cdd952
to
ce09ce5
Compare
Update VmResources type with virtio-pmem configuration field to allow virtio-pmem devices be configured through config files and later through API calls. Signed-off-by: Egor Lazarchuk <[email protected]>
Both virtio-block and virtio-pmem can act as root devices for a VM. Add a check to prevent specifing more than 1 root device for a VM. Signed-off-by: Egor Lazarchuk <[email protected]>
Add /pmem/id PUT request for virtio-pmem configuration. Add corresponding metrics. Signed-off-by: Egor Lazarchuk <[email protected]>
Virtio-pmem devices need to allocate a memory region in guest physical memory. The safe place to do this is past 64bit MMIO region. Signed-off-by: Egor Lazarchuk <[email protected]>
Add a counter for KVM slot ids into VmCommon struct. This is done because virtio-pmem device needs to obtain it's KVM slot id independently from number of slots in GuestMemoryMmap. Signed-off-by: Egor Lazarchuk <[email protected]>
Add methods to attach virtio-pmem devices to Vmm. Add methods to create KVM memory slot for virtio-pmem devices. Signed-off-by: Egor Lazarchuk <[email protected]>
Add logic to store and restore virtio-pmem device information in a snapshot. Signed-off-by: Egor Lazarchuk <[email protected]>
Add functional and API tests for virtio-pmem device and its configuration fields Signed-off-by: Egor Lazarchuk <[email protected]>
Expose virtio-pmem metrics in the logger, so they are exported in metrics.json. Update integration tests to expect new metrics. Signed-off-by: Egor Lazarchuk <[email protected]>
Add description of pmem APIs in swagger file and device-api.md Signed-off-by: Egor Lazarchuk <[email protected]>
Add new document about virtio-pmem configuration and usage. Signed-off-by: Egor Lazarchuk <[email protected]>
Add a note about addition of virtio-pmem device. Signed-off-by: Egor Lazarchuk <[email protected]>
ce09ce5
to
5bac831
Compare
Changes
Add
virtio-pmem
device support.Closes #5448
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkbuild --all
to verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.