Skip to content

Fix greedy block device #5260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 20, 2025
Merged

Conversation

ShadowCurse
Copy link
Contributor

@ShadowCurse ShadowCurse commented Jun 13, 2025

Changes

  • Minor fixes to the queue size
  • Move guest notification logic from the block device request processing loop. This could cause a situation when the guest would continuously feed the block device new requests not letting other devices to operate. By moving notification outside, the guest can only send a limited number of request as the queue has a limit on how much requests it can hold.
  • Move used ring index update outside add_used call to be more VIRTIO spec compliant.

Reason

We discovered a bug where a sufficiently slow drive could cause the IO thread to get stuck processing block device request indefinitely, thus starving all other devices.
This is caused by the guest driver adding new descriptors to the virtio queue while the device is still processing them, leading to an indefinite loop.
The fix is to only update the used index and send a notification to the guest driver only once all descriptors have been processed by the device. This mitigates the issue by bounding the amount of time spent in processing block device requests, allowing other devices to complete their work as well.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse force-pushed the fair_devices_2 branch 5 times, most recently from f896954 to 90bd921 Compare June 15, 2025 16:21
Copy link

codecov bot commented Jun 15, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 16 lines in your changes missing coverage. Please review.

Project coverage is 82.88%. Comparing base (ad41c6c) to head (2aa4f12).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/vsock/device.rs 60.86% 9 Missing ⚠️
src/vmm/src/devices/virtio/block/virtio/device.rs 80.64% 6 Missing ⚠️
src/vmm/src/devices/virtio/mmio.rs 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5260      +/-   ##
==========================================
+ Coverage   82.84%   82.88%   +0.03%     
==========================================
  Files         250      250              
  Lines       26967    26968       +1     
==========================================
+ Hits        22342    22352      +10     
+ Misses       4625     4616       -9     
Flag Coverage Δ
5.10-c5n.metal 83.31% <80.00%> (-0.02%) ⬇️
5.10-m5n.metal 83.31% <80.00%> (-0.02%) ⬇️
5.10-m6a.metal 82.53% <80.00%> (-0.03%) ⬇️
5.10-m6g.metal 79.14% <80.00%> (-0.03%) ⬇️
5.10-m6i.metal 83.30% <80.00%> (-0.02%) ⬇️
5.10-m7a.metal-48xl 82.51% <80.00%> (?)
5.10-m7g.metal 79.14% <80.00%> (-0.02%) ⬇️
5.10-m7i.metal-24xl 83.27% <80.00%> (?)
5.10-m7i.metal-48xl 83.26% <80.00%> (?)
5.10-m8g.metal-24xl 79.14% <80.00%> (?)
5.10-m8g.metal-48xl 79.14% <80.00%> (?)
6.1-c5n.metal 83.36% <80.00%> (-0.02%) ⬇️
6.1-m5n.metal 83.35% <80.00%> (-0.02%) ⬇️
6.1-m6a.metal 82.57% <80.00%> (-0.03%) ⬇️
6.1-m6g.metal 79.14% <80.00%> (-0.03%) ⬇️
6.1-m6i.metal 83.35% <80.00%> (-0.02%) ⬇️
6.1-m7a.metal-48xl 82.57% <80.00%> (?)
6.1-m7g.metal 79.14% <80.00%> (-0.03%) ⬇️
6.1-m7i.metal-24xl 83.37% <80.00%> (?)
6.1-m7i.metal-48xl 83.37% <80.00%> (?)
6.1-m8g.metal-24xl 79.14% <80.00%> (?)
6.1-m8g.metal-48xl 79.14% <80.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShadowCurse ShadowCurse marked this pull request as ready for review June 15, 2025 17:27
@ShadowCurse ShadowCurse added Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Fix Indicates a fix to existing code labels Jun 15, 2025
roypat
roypat previously requested changes Jun 16, 2025
Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the virtio spec says

2.6.7.2 Device Requirements: Used Buffer Notification Suppression
If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:

  • The device MUST ignore the used_event value.
  • After the device writes a descriptor index into the used ring:
    • If flags is 1, the device SHOULD NOT send a notification.
    • If flags is 0, the device MUST send a notification.

Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:

  • The device MUST ignore the lower bit of flags.
  • After the device writes a descriptor index into the used ring:
    • If the idx field in the used ring (which determined where that descriptor index was placed) was equal to used_event, the device MUST send a notification.
    • Otherwise the device SHOULD NOT send a notification.

So if the guest doesnt accept the VIRTIO_F_EVENT_IDX feature bit, then we must send notifications as we do today (e.g. without this PR), immediately after every time we write to the used ring. So in that sense, I don't think this is correct :/

@bchalios
Copy link
Contributor

So the virtio spec says

2.6.7.2 Device Requirements: Used Buffer Notification Suppression
If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:

  • The device MUST ignore the used_event value.

  • After the device writes a descriptor index into the used ring:

    • If flags is 1, the device SHOULD NOT send a notification.
    • If flags is 0, the device MUST send a notification.

Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:

  • The device MUST ignore the lower bit of flags.

  • After the device writes a descriptor index into the used ring:

    • If the idx field in the used ring (which determined where that descriptor index was placed) was equal to used_event, the device MUST send a notification.
    • Otherwise the device SHOULD NOT send a notification.

So if the guest doesnt accept the VIRTIO_F_EVENT_IDX feature bit, then we must send notifications as we do today (e.g. without this PR), immediately after every time we write to the used ring. So in that sense, I don't think this is correct :/

So the virtio spec says

2.6.7.2 Device Requirements: Used Buffer Notification Suppression
If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:

  • The device MUST ignore the used_event value.

  • After the device writes a descriptor index into the used ring:

    • If flags is 1, the device SHOULD NOT send a notification.
    • If flags is 0, the device MUST send a notification.

Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:

  • The device MUST ignore the lower bit of flags.

  • After the device writes a descriptor index into the used ring:

    • If the idx field in the used ring (which determined where that descriptor index was placed) was equal to used_event, the device MUST send a notification.
    • Otherwise the device SHOULD NOT send a notification.

So if the guest doesnt accept the VIRTIO_F_EVENT_IDX feature bit, then we must send notifications as we do today (e.g. without this PR), immediately after every time we write to the used ring. So in that sense, I don't think this is correct :/

So the virtio spec says

2.6.7.2 Device Requirements: Used Buffer Notification Suppression
If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:

  • The device MUST ignore the used_event value.

  • After the device writes a descriptor index into the used ring:

    • If flags is 1, the device SHOULD NOT send a notification.
    • If flags is 0, the device MUST send a notification.

Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:

  • The device MUST ignore the lower bit of flags.

  • After the device writes a descriptor index into the used ring:

    • If the idx field in the used ring (which determined where that descriptor index was placed) was equal to used_event, the device MUST send a notification.
    • Otherwise the device SHOULD NOT send a notification.

So if the guest doesnt accept the VIRTIO_F_EVENT_IDX feature bit, then we must send notifications as we do today (e.g. without this PR), immediately after every time we write to the used ring. So in that sense, I don't think this is correct :/

But we do send a notification, just a bit later :) I think the spirit of this MUST is that we need to eventually send a notification if we idx == used_event, not necessarily when. IOW, if we don't reach used_event we shouldn't send a notification at all, but I don't think there is something that obliges us to send a notification before we process more descriptors.

Saying these, it made me thinking that it might be the case that guest only adds descriptors that (if consumed all) would reach the used_event. Not past that. But this needs to be proven.

@roypat
Copy link
Contributor

roypat commented Jun 16, 2025

But we do send a notification, just a bit later :) I think the spirit of this MUST is that we need to eventually send a notification if we idx == used_event, not necessarily when. IOW, if we don't reach used_event we shouldn't send a notification at all, but I don't think there is something that obliges us to send a notification before we process more descriptors.

Saying these, it made me thinking that it might be the case that guest only adds descriptors that (if consumed all) would reach the used_event. Not past that. But this needs to be proven.

...mh, I guess the spec is indeed a bit vague here, but I interpreted the "after" as "immediately after". But I can totally see your interpretation as well now. I guess in this case, if it works on linux...

@roypat roypat dismissed their stale review June 16, 2025 12:05

spec vague, will defer to "it works with linux guests"

Manciukic
Manciukic previously approved these changes Jun 19, 2025
roypat
roypat previously approved these changes Jun 19, 2025
bchalios
bchalios previously approved these changes Jun 20, 2025
The `max_size` field is public, so no need for a getter.

Signed-off-by: Egor Lazarchuk <[email protected]>
The size of queue set by the driver must be always less or equal to the
queue size in FC. This is checked before device activation.
This removes the need for `actual_size` function.

Signed-off-by: Egor Lazarchuk <[email protected]>
@Manciukic Manciukic dismissed stale reviews from bchalios and roypat via 9eedd55 June 20, 2025 10:40
Currently block device has a guest notification logic
inside it's request processing loop. This can create a
situation when guest can continuously add more requests to the
queue, making the whole request processing loop arbitrary long.
This is an issue, since it block any other IO from being processed.

The solution is to simply notify guest one time, after all current
requests are processed.

Signed-off-by: Egor Lazarchuk <[email protected]>
Signed-off-by: Riccardo Mancini <[email protected]>
VIRTIO spec states:
```
After the device writes a descriptor index into the used ring:
  If the idx field in the used ring (which determined where that
  descriptor index was placed) was equal to used_event, the device
  MUST send a notification.
```
The current implementation does not follow this very precisely. It
bumps used ring index when new descriptors are added to the used
ring. But the check if the notification is needed is postponed to
later processing stage.
To be more VIRTIO spec compliant simply move the logic for updating
the used ring index into the later processing stage as well, just
before the check if the notification should be send.

Signed-off-by: Egor Lazarchuk <[email protected]>
usize cannot be negative

Signed-off-by: Egor Lazarchuk <[email protected]>
Add a new changelog entry for the block device fairness fix.

Signed-off-by: Riccardo Mancini <[email protected]>
@roypat roypat enabled auto-merge (rebase) June 20, 2025 14:39
@roypat roypat merged commit a099f81 into firecracker-microvm:main Jun 20, 2025
5 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Fix Indicates a fix to existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants