Skip to content

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Aug 18, 2021

For devices with access to data in memory being translated, we add to
the Queue the ability to translate the address stored in the descriptor.

It is very helpful as it performs the translation right after the
untranslated address is read from memory, avoiding any errors from
happening from the consumer's crate perspective. It also allows the
consumer to reduce greatly the amount of duplicated code for applying
the translation in many different places.

Signed-off-by: Sebastien Boeuf [email protected]

@sboeuf sboeuf force-pushed the add_addr_translation branch from bcb2e4b to 98f5e49 Compare August 18, 2021 16:38
@sboeuf sboeuf changed the title virtio-queue: Add trait for handling address translation virtio-queue: Handle address translations Aug 18, 2021
@sboeuf sboeuf requested a review from andreeaflorescu August 19, 2021 07:42
@sboeuf
Copy link
Author

sboeuf commented Aug 30, 2021

PTAL @jiangliu @slp @andreeaflorescu

For devices with access to data in memory being translated, we add to
the Queue the ability to translate the address stored in the descriptor.

It is very helpful as it performs the translation right after the
untranslated address is read from memory, avoiding any errors from
happening from the consumer's crate perspective. It also allows the
consumer to reduce greatly the amount of duplicated code for applying
the translation in many different places.

Signed-off-by: Sebastien Boeuf <[email protected]>
@sboeuf sboeuf force-pushed the add_addr_translation branch from 98f5e49 to 537dcb2 Compare August 30, 2021 10:44
Copy link
Collaborator

@slp slp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Perhaps it'd be nice to implement a test that uses a mock-up AccessPlatform and verifies the translation of some descriptor addresses?

@alexandruag
Copy link
Collaborator

Hi, sorry was away for a while. I'll do a review and return with any comments asap.

@alexandruag
Copy link
Collaborator

Hi again, just had a better look and this is an interesting use case. From the perspective of something that always requires this, there is def an ease of use advantage with placing the functionality here, but at the same time it's otherwise completely unrelated and orthogonal to virtio queue handling and descriptor chain parsing, so it would be great to find another solution and keep concerns separated.

A better understanding of the context would also help a lot. Can you please give some more details about what generates the use case? Is the translation specific to certain devices, or memory backends, or is it required by something completely different? Depending on what the requirements are we can maybe find alternative solutions that can be plugged into devices, or a GuestMemory thin wrapper that transparently performs translations, etc.

@sboeuf
Copy link
Author

sboeuf commented Sep 13, 2021

From the perspective of something that always requires this, there is def an ease of use advantage with placing the functionality here, but at the same time it's otherwise completely unrelated and orthogonal to virtio queue handling and descriptor chain parsing, so it would be great to find another solution and keep concerns separated.

Well since I'm using an Option, even when it's not used, it's not adding any significant overhead to the queue handling.

A better understanding of the context would also help a lot. Can you please give some more details about what generates the use case? Is the translation specific to certain devices, or memory backends, or is it required by something completely different? Depending on what the requirements are we can maybe find alternative solutions that can be plugged into devices, or a GuestMemory thin wrapper that transparently performs translations, etc.

This is required to support virtio-iommu for virtio devices. When using virtio-iommu, you expose a vIOMMU to the guest, and the guest can see a bunch of virtio devices attached to it. This means the addresses passed through the virtqueues are IOVAs and not GPAs. That's where the need for translation comes from.

I'm open to solutions that you think would make more sense but it's very important to keep this hidden from the virtio devices themselves, otherwise we would end up having to copy/paste the same translation code for each virtio device.
A GuestMemory wrapper might be interesting, could you elaborate how we could achieve this?
This missing translation feature is the main blocker preventing Cloud-Hypervisor from moving to the upstream version of the virtio queues :)

@alexandruag
Copy link
Collaborator

Thanks for the explanation, that's some really useful context to have! I've temporarily switched focus to #76, and was just thinking whether we can leverage some of the ideas introduced there to help with meeting this use case as well. I also want to have a quick look at virtio-iommu just to consolidate the big picture a bit, and will get back to this PR soon.

@jiangliu
Copy link
Member

Seems we only handle IOVA in virtio descriptors, so other guest DMA address such as descriptor table address, available ring address and used ring address have already been handled by the queue caller?

@sboeuf
Copy link
Author

sboeuf commented Sep 14, 2021

Seems we only handle IOVA in virtio descriptors, so other guest DMA address such as descriptor table address, available ring address and used ring address have already been handled by the queue caller?

Yes we don't need to perform the translation of the descriptor table, available ring and used ring addresses every time. Instead I chose the approach of having the VMM doing the translation only when the queue is enabled by the guest. But we can discuss this as well.

@jiangliu
Copy link
Member

I'm a little concern about current design. Say guest is doing a big block device IO, mapping several discontinuous GPA into a continuous IOVA. Then only one virtio descriptor will be allocated for the block IO. How will current design handle this case?

@jiangliu
Copy link
Member

I'm a little concern about current design. Say guest is doing a big block device IO, mapping several discontinuous GPA into a continuous IOVA. Then only one virtio descriptor will be allocated for the block IO. How will current design handle this case?

In other words, when virito-viommu is involved, a continuous IOVA (for a virtio descriptor) may be discontinues in GPA.

@sboeuf
Copy link
Author

sboeuf commented Sep 14, 2021

I'm a little concern about current design. Say guest is doing a big block device IO, mapping several discontinuous GPA into a continuous IOVA. Then only one virtio descriptor will be allocated for the block IO. How will current design handle this case?

Yes you're right that seems like a case we should also handle with care. I think having a wrapper in vm-memory to handle the translation might be the best option as suggested by @alexandruag
Do you think that would solve this issue?

@jiangliu
Copy link
Member

jiangliu commented Sep 14, 2021

I'm a little concern about current design. Say guest is doing a big block device IO, mapping several discontinuous GPA into a continuous IOVA. Then only one virtio descriptor will be allocated for the block IO. How will current design handle this case?

Yes you're right that seems like a case we should also handle with care. I think having a wrapper in vm-memory to handle the translation might be the best option as suggested by @alexandruag
Do you think that would solve this issue?

I have the same feeling. Maybe extending vm-memory to access guest memory by both GVA and HVA is more robust design. But then we need to really careful about performance. We have observed performance penalty caused by vm-memory.

In theory, device driver should only access guest memory with IOVA, so extending vm-memory to support IOVA is natural.

@alexandruag
Copy link
Collaborator

Hi again, I'm currently experimenting with the wrapper option, but still have to iron out a couple of things which are a bit harder to do than initially expected. Meanwhile, I was curious if there are other requirements for this use case as well. In particular, can you please give some more details about the implications of the "discontinuous in GPA, but continuous IOVA" scenario the Gerry mentions above, and what is (at least on a conceptual level) the proper way to handle it?

@sboeuf
Copy link
Author

sboeuf commented Sep 28, 2021

Meanwhile, I was curious if there are other requirements for this use case as well. In particular, can you please give some more details about the implications of the "discontinuous in GPA, but continuous IOVA" scenario the Gerry mentions above, and what is (at least on a conceptual level) the proper way to handle it?

Well I think it's mainly about handling the case where it might span over multiple memory regions, which is already handled internally by vm-memory.

@alexandruag
Copy link
Collaborator

Thanks! Just a quick update: the GuestMemory wrapper approach did not turn out super feasible, mostly because the GuestMemory interface has quite a lot of features, and when taking into account things like iterating over the regions, stuff quickly becomes quite complicated so I'm no longer very optimistic about this alternative right now.

I'm currently looking at two others. The first (which is not necessarily also the most appealing) relies on the observation that many queue related operations right now do not actually make use of the full GuestMemory interface, but only rely on a couple of operations, all of them from Bytes. If we restrict the trait bound to that (or something even simpler) it becomes easier to implement indirection using an adapter.

The second option is to extend the definition of generic interfaces started by Gerry's recent PR to descriptor chains as well, at which point it once again becomes simpler to have plugable functionality. Looks like the more difficult part in this case is what's the best way to dispatch operations to a particular implementation, and that's something I'm giving thought to ATM. Will be back soon with more details.

@sboeuf
Copy link
Author

sboeuf commented Jan 24, 2022

Thanks! Just a quick update: the GuestMemory wrapper approach did not turn out super feasible, mostly because the GuestMemory interface has quite a lot of features, and when taking into account things like iterating over the regions, stuff quickly becomes quite complicated so I'm no longer very optimistic about this alternative right now.

I'm currently looking at two others. The first (which is not necessarily also the most appealing) relies on the observation that many queue related operations right now do not actually make use of the full GuestMemory interface, but only rely on a couple of operations, all of them from Bytes. If we restrict the trait bound to that (or something even simpler) it becomes easier to implement indirection using an adapter.

The second option is to extend the definition of generic interfaces started by Gerry's recent PR to descriptor chains as well, at which point it once again becomes simpler to have plugable functionality. Looks like the more difficult part in this case is what's the best way to dispatch operations to a particular implementation, and that's something I'm giving thought to ATM. Will be back soon with more details.

@alexandruag any progress? I'd like to be able to move Cloud Hypervisor to the upstream version of vm-virtio/virtio-queue and I can't do so until we find a solution regarding this address translation.

@sboeuf
Copy link
Author

sboeuf commented Jan 28, 2022

Since it's very complicated to find a consensus on this matter, I've implemented the support for this thing directly in Cloud Hypervisor cloud-hypervisor/cloud-hypervisor#3641

@sboeuf sboeuf closed this Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants