Vfio user.client.v5 #21

jlevon · 2025-06-25T19:12:45Z

No description provided.

Rust makes the current file available as a statically-allocated string, but without a NUL terminator. Allow this by storing an optional maximum length in the Error. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

The function name is not available in Rust, so make it optional. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Provide an implementation of std::error::Error that bridges the Rust anyhow::Error and std::panic::Location types with QEMU's Error*. It also has several utility methods, analogous to error_propagate(), that convert a Result into a return value + Error** pair. One important difference is that these propagation methods *panic* if *errp is NULL, unlike error_propagate() which eats subsequent errors[1]. The reason for this is that in C you have an error_set*() call at the site where the error is created, and calls to error_propagate() are relatively rare. In Rust instead, even though these functions do "propagate" a qemu_api::Error into a C Error**, there is no error_setg() anywhere that could check for non-NULL errp and call abort(). error_propagate()'s behavior of ignoring subsequent errors is generally considered weird, and there would be a bigger risk of triggering it from Rust code. [1] This is actually a violation of the preconditions of error_propagate(), so it should not happen. But you never know... Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Paolo Bonzini <[email protected]>

Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Remove the need to convert after every read of the BqlCell. Because the vmstate uses a u8 as the size of the VARRAY, this requires switching the VARRAY to use num_timers_save; which in turn requires ensuring that the num_timers_save is always there. For simplicity do this by removing support for version 1, which QEMU has not been producing for ~15 years. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

No functional change intended. Suggested-by: Zhao Liu <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Do not silently adjust num_timers, and fail if intcap is 0. Reviewed-by: Markus Armbruster <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Match the code in hpet.c; this also allows removing the BqlCell from the num_timers field. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

Now that the num_timers field is initialized as a property, someone may change its default value using qdev_prop_set_uint8(), but the value is fixed after the Rust code sees it first. Since there is no need to modify it after realize(), it is not to be necessary to have a BqlCell wrapper. Signed-off-by: Zhao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] [Remove .into() as well. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>

error is new; offset_of is gone. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

If the enum includes values such as "Ok", "Err", or "Error", the TryInto macro can cause errors. Be careful and qualify identifiers with the full path, or in the case of TryFrom<>::Error do not use the associated type at all. Signed-off-by: Paolo Bonzini <[email protected]>

A page state change is typically followed by an access of the page(s) and results in another VMEXIT in order to map the page into the nested page table. Depending on the size of page state change request, this can generate a number of additional VMEXITs. For example, under SNP, when Linux is utilizing lazy memory acceptance, memory is typically accepted in 4M chunks. A page state change request is submitted to mark the pages as private, followed by validation of the memory. Since the guest_memfd currently only supports 4K pages, each page validation will result in VMEXIT to map the page, resulting in 1024 additional exits. When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the size of the page state change in order to pre-map the pages and avoid the additional VMEXITs. This helps speed up boot times. Signed-off-by: Tom Lendacky <[email protected]> Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini <[email protected]>

futex(2) - Linux manual page https://man7.org/linux/man-pages/man2/futex.2.html > Note that a wake-up can also be caused by common futex usage patterns > in unrelated code that happened to have previously used the futex > word's memory location (e.g., typical futex-based implementations of > Pthreads mutexes can cause this under some conditions). Therefore, > callers should always conservatively assume that a return value of 0 > can mean a spurious wake-up, and use the futex word's value (i.e., > the user-space synchronization scheme) to decide whether to continue > to block or not. Signed-off-by: Akihiko Odaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

Windows supports futex-like APIs since Windows 8 and Windows Server 2012. Signed-off-by: Akihiko Odaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

scripts/checkpatch.pl warns for __linux__ saying "architecture specific defines should be avoided". Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

qemu-thread used to abstract pthread primitives into futex for the QemuEvent implementation of POSIX systems other than Linux. However, this abstraction has one key difference: unlike futex, pthread primitives require an explicit destruction, and it must be ordered after wait and wake operations. It would be easier to perform destruction if a wait operation ensures the corresponding wake operation finishes as POSIX semaphore does, but that requires to protect state accesses in qemu_event_set() and qemu_event_wait() with a mutex. On the other hand, real futex does not need such a protection but needs complex barrier and atomic operations to ensure ordering between the two functions. Add special implementations of qemu_event_set() and qemu_event_wait() using pthread primitives. qemu_event_wait() will ensure qemu_event_set() finishes, and these functions will avoid complex barrier and atomic operations to ensure ordering between them. Signed-off-by: Akihiko Odaki <[email protected]> Tested-by: Phil Dennis-Jordan <[email protected]> Reviewed-by: Phil Dennis-Jordan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

Use the futex-based implementation of QemuEvent on Windows to remove code duplication and remove the overhead of event object construction and destruction. Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

This unlocks the futex-based implementation of QemuLockCnt to Windows. Signed-off-by: Akihiko Odaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

Document QemuEvent to help choose an appropriate synchronization primitive. Signed-off-by: Akihiko Odaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

pause_event can utilize qemu_event_reset() to discard events. Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

colo_exit_sem and colo_incoming_sem represent one-shot events so they can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Fabiano Rosas <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

thread_sync_sem is an one-shot event so it can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Fabiano Rosas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted into QemuEvent, which is more specialized for such a use case. Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

The Intel SDM section 10.2.3.3 on the MXCSR.FTZ bit says that we flush outputs to zero when we detect underflow, which is after rounding. Set the detect_ftz flag accordingly. This allows us to enable the test in fma.c which checks this behaviour. Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

The softfloat get_float_exception_flags() function returns 'int', but in various places in target/i386 we incorrectly store the returned value into a uint8_t. This currently has no ill effects because i386 doesn't care about any of the float_flag enum values above 0x40. However, we want to start using float_flag_input_denormal_used, which is 0x4000. Switch to using 'int' so that we can handle all the possible valid float_flag_* values. This includes changing the return type of save_exception_flags() and the argument to merge_exception_flags(). Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

The x86 DE bit in the FPU and MXCSR status is supposed to be set when an input denormal is consumed. We didn't previously report this from softfloat, so the x86 code either simply didn't set the DE bit or else incorrectly wired it up to denormal_flushed, depending on which register you looked at. Now we have input_denormal_used we can wire up these DE bits with the semantics they are supposed to have. Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

Add some fma test cases that check for correct handling of FTZ and for the flag that indicates that the input denormal was consumed. Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

… staging Python Pull Request Add QAPI and QAPI doc files to python static analysis testing regime, this time for real, probably # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAmhB388ACgkQfe+BBqr8 # OQ6lMA//WJtSr57ADW5k5zcRMxV7k//erYFkjgXbTh7b9DDblMwNVhYr5lqJbEvS # V5OChW32++QIO5Y4cBhzbzxFTJXbAYzyg3UATCkH2kRbd139bqdAtsnsaFmoHmLP # c8KAggT1+hIb7JIVkFiFccMsdCeFwXwQoS5Nk7w95H9cxxYUj/O9qbRuCN+elg/e # mX4zaq6F2umTx0EdD35DlBPrPPyRsdlVWKUqh8f5KaAGPOelGyvbgwrXU2MT7ewG # JXcRoYzn/9J2KSboiFY0MjIKqDuhoMdCnbSNpRNGgClJRa+VZEBPFClMe1YSXw0m # J3kQMYeqm5S1GUG+ZrBTICY6Ch8jNq2kb3ua707JJWdYmd9gq0poF/P7gaRVbyAL # 5UdYVVgtH/3xve2LGe0guj3v5kTK7Vo6dApwj8pRHrBWWOgAG0UgGseOJgndfCIx # PQRsF2T4YoVdjiGB46EIgBmoFI+VJGwFRlvb6WZ0YmPedi7MuUvWmo0lbgDkaTO+ # MMqsWxShTY+xwnSFgtl1iHOAdfT6jiHcn1n+hZrGpvF492XRjW02zKiDSZECqSz5 # lg51+OaDc2HwS65sYyFb4GD7yF/PcdOj7MG/Ij9dx0GoM9/HmcVAHyRt45QNgxzc # N7Xx6GFGs7puDoE/pSoauFtGC8XeR6Cx0HfBcXYGaJcJEq6N4yw= # =IVAr # -----END PGP SIGNATURE----- # gpg: Signature made Thu 05 Jun 2025 14:19:59 EDT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <[email protected]>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * tag 'python-pull-request' of https://gitlab.com/jsnow/qemu: qapi: delete un-needed python static analysis configs python: Drop redundant warn_unused_configs = True python: add qapi static analysis tests python: update missing dependencies from minreqs docs/qapidoc: linting fixes qapi: Add some pylint ignores Signed-off-by: Stefan Hajnoczi <[email protected]>

* futex: support Windows * qemu-thread: Avoid futex abstraction for non-Linux * migration, hw/display/apple-gfx: replace QemuSemaphore with QemuEvent * rust: bindings for Error * hpet, rust/hpet: return errors from realize if properties are incorrect * rust/hpet: Drop BqlCell wrapper for num_timers * target/i386: Emulate ftz and denormal flag bits correctly * i386/kvm: Prefault memory on page state change # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhC4AgUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroP09wf+K9e0TaaZRxTsw7WU9pXsDoYPzTLd # F5CkBZPY770X1JW75f8Xw5qKczI0t6s26eFK1NUZxYiDVWzW/lZT6hreCUQSwzoS # b0wlAgPW+bV5dKlKI2wvnadrgDvroj4p560TS+bmRftiu2P0ugkHHtIJNIQ+byUQ # sWdhKlUqdOXakMrC4H4wDyIgRbK4CLsRMbnBHBUENwNJYJm39bwlicybbagpUxzt # w4mgjbMab0jbAd2hVq8n+A+1sKjrroqOtrhQLzEuMZ0VAwocwuP2Adm6gBu9kdHV # tpa8RLopninax3pWVUHnypHX780jkZ8E7zk9ohaaK36NnWTF4W/Z41EOLw== # =Vs6V # -----END PGP SIGNATURE----- # gpg: Signature made Fri 06 Jun 2025 08:33:12 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "[email protected]" # gpg: Good signature from "Paolo Bonzini <[email protected]>" [full] # gpg: aka "Paolo Bonzini <[email protected]>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (31 commits) tests/tcg/x86_64/fma: add test for exact-denormal output target/i386: Wire up MXCSR.DE and FPUS.DE correctly target/i386: Use correct type for get_float_exception_flags() values target/i386: Detect flush-to-zero after rounding hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent migration/postcopy: Replace QemuSemaphore with QemuEvent migration/colo: Replace QemuSemaphore with QemuEvent migration: Replace QemuSemaphore with QemuEvent qemu-thread: Document QemuEvent qemu-thread: Use futex if available for QemuLockCnt qemu-thread: Use futex for QemuEvent on Windows qemu-thread: Avoid futex abstraction for non-Linux qemu-thread: Replace __linux__ with CONFIG_LINUX futex: Support Windows futex: Check value after qemu_futex_wait() i386/kvm: Prefault memory on page state change rust: make TryFrom macro more resilient docs: update Rust module status rust/hpet: Drop BqlCell wrapper for num_timers rust/hpet: return errors from realize if properties are incorrect ... Signed-off-by: Stefan Hajnoczi <[email protected]>

…result Modify memory_region_set_ram_discard_manager() to return -EBUSY if a RamDiscardManager is already set in the MemoryRegion. The caller must handle this failure, such as having virtio-mem undo its actions and fail the realize() process. Opportunistically move the call earlier to avoid complex error handling. This change is beneficial when introducing a new RamDiscardManager instance besides virtio-mem. After ram_block_coordinated_discard_require(true) unlocks all RamDiscardManager instances, only one instance is allowed to be set for one MemoryRegion at present. Suggested-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Tested-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Chenyi Qiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]>

…rd() Update ReplayRamDiscard() function to return the result and unify the ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamDiscardState() at the same time due to their identical definitions. This unification simplifies related structures, such as VirtIOMEMReplayData, which makes it cleaner. Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Chenyi Qiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]>

… with guest_memfd Commit 852f004 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted that subsystems like VFIO may disable RAM block discard. However, guest_memfd relies on discard operations for page conversion between private and shared memory, potentially leading to the stale IOMMU mapping issue when assigning hardware devices to confidential VMs via shared memory. To address this and allow shared device assignement, it is crucial to ensure the VFIO system refreshes its IOMMU mappings. RamDiscardManager is an existing interface (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignment. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other. Therefore, similar actions are required for page conversion events. Introduce the RamDiscardManager to guest_memfd to facilitate this process. Since guest_memfd is not an object, it cannot directly implement the RamDiscardManager interface. Implementing it in HostMemoryBackend is not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks have a memory backend while others do not. Notably, virtual BIOS RAMBlocks using memory_region_init_ram_guest_memfd() do not have a backend. To manage RAMBlocks with guest_memfd, define a new object named RamBlockAttributes to implement the RamDiscardManager interface. This object can store the guest_memfd information such as the bitmap for shared memory and the registered listeners for event notifications. A new state_change() helper function is provided to notify listeners, such as VFIO, allowing VFIO to do dynamically DMA map and unmap for the shared memory according to conversion events. Note that in the current context of RamDiscardManager for guest_memfd, the shared state is analogous to being populated, while the private state can be considered discarded for simplicity. In the future, it would be more complicated if considering more states like private/shared/discarded at the same time. In current implementation, memory state tracking is performed at the host page size granularity, as the minimum conversion size can be one page per request. Additionally, VFIO expected the DMA mapping for a specific IOVA to be mapped and unmapped with the same granularity. Confidential VMs may perform partial conversions, such as conversions on small regions within a larger one. To prevent such invalid cases and until support for DMA mapping cut operations is available, all operations are performed with 4K granularity. In addition, memory conversion failures cause QEMU to quit rather than resuming the guest or retrying the operation at present. It would be future work to add more error handling or rollback mechanisms once conversion failures are allowed. For example, in-place conversion of guest_memfd could retry the unmap operation during the conversion from shared to private. For now, keep the complex error handling out of the picture as it is not required. Tested-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Signed-off-by: Chenyi Qiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [peterx: squash fixup from Chenyi to fix builds] Signed-off-by: Peter Xu <[email protected]>

A new field, attributes, was introduced in RAMBlock to link to a RamBlockAttributes object, which centralizes all guest_memfd related information (such as fd and status bitmap) within a RAMBlock. Create and initialize the RamBlockAttributes object upon ram_block_add(). Meanwhile, register the object in the target RAMBlock's MemoryRegion. After that, guest_memfd-backed RAMBlock is associated with the RamDiscardManager interface, and the users can execute RamDiscardManager specific handling. For example, VFIO will register the RamDiscardListener and get notifications when the state_change() helper invokes. As coordinate discarding of RAM with guest_memfd is now supported, only block uncoordinated discard. Tested-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Chenyi Qiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]>

Currently we have a short paragraph saying that patches must include a Signed-off-by line, and merely link to the kernel documentation. The linked kernel docs have a lot of content beyond the part about sign-off an thus are misleading/distracting to QEMU contributors. This introduces a dedicated 'code-provenance' page in QEMU talking about why we require sign-off, explaining the other tags we commonly use, and what to do in some edge cases. Signed-off-by: Daniel P. Berrangé <[email protected]> Reviewed-by: Peter Maydell <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Markus Armbruster <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]>

Files contributed to QEMU are generally expected to be provided in the preferred format for manipulation. IOW, we generally don't expect to have generated / compiled code included in the tree, rather, we expect to run the code generator / compiler as part of the build process. There are some obvious exceptions to this seen in our existing tree, the biggest one being the inclusion of many binary firmware ROMs. A more niche example is the inclusion of a generated eBPF program. Or the CI dockerfiles which are mostly auto-generated. In these cases, however, the preferred format source code is still required to be included, alongside the generated output. Tools which perform user defined algorithmic transformations on code are not considered to be "code generators". ie, we permit use of coccinelle, spell checkers, and sed/awk/etc to manipulate code. Such use of automated manipulation should still be declared in the commit message. One off generators which create a boilerplate file which the author then fills in, are acceptable if their output has clear copyright and license status. This could be where a contributor writes a throwaway python script to automate creation of some mundane piece of code for example. Signed-off-by: Daniel P. Berrangé <[email protected]> Reviewed-by: Alex Bennée <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Markus Armbruster <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]>

There has been an explosion of interest in so called AI code generators. Thus far though, this is has not been matched by a broadly accepted legal interpretation of the licensing implications for code generator outputs. While the vendors may claim there is no problem and a free choice of license is possible, they have an inherent conflict of interest in promoting this interpretation. More broadly there is, as yet, no broad consensus on the licensing implications of code generators trained on inputs under a wide variety of licenses The DCO requires contributors to assert they have the right to contribute under the designated project license. Given the lack of consensus on the licensing of AI code generator output, it is not considered credible to assert compliance with the DCO clause (b) or (c) where a patch includes such generated code. This patch thus defines a policy that the QEMU project will currently not accept contributions where use of AI code generators is either known, or suspected. These are early days of AI-assisted software development. The legal questions will be resolved eventually. The tools will mature, and we can expect some to become safely usable in free software projects. The policy we set now must be for today, and be open to revision. It's best to start strict and safe, then relax. Meanwhile requests for exceptions can also be considered on a case by case basis. Signed-off-by: Daniel P. Berrangé <[email protected]> Reviewed-by: Kevin Wolf <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Markus Armbruster <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]>

…rx/qemu into staging Migration / Memory pull - Yanfei's optimization to skip log_clear during completion - Fabiano's cleanup to remove leftover migration-helpers.c file - Juraj's vnc fix on display pause after migration - Jaehoon's cpr test fix on possible race of server establishment - Chenyi's initial support on vfio enablement for guest-memfd # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCaFmzWhIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wbWYQD/dz08tyaL2J4EHESfBsW4Z1rEggVOM0cB # hlXnvzf/Pb4A/0X3Hn18bOxfPAZOr8NggS5AKgzCCYVeQEWQA2Jj8hwC # =kcTN # -----END PGP SIGNATURE----- # gpg: Signature made Mon 23 Jun 2025 16:04:42 EDT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "[email protected]" # gpg: Good signature from "Peter Xu <[email protected]>" [full] # gpg: aka "Peter Xu <[email protected]>" [full] # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu: physmem: Support coordinated discarding of RAM with guest_memfd ram-block-attributes: Introduce RamBlockAttributes to manage RAMBlock with guest_memfd memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() memory: Change memory_region_set_ram_discard_manager() to return the result memory: Export a helper to get intersection of a MemoryRegionSection with a given range migration: Don't sync volatile memory after migration completes tests/migration: Setup pre-listened cpr.sock to remove race-condition. migration: Support fd-based socket address in cpr_transfer_input ui/vnc: Update display update interval when VM state changes to RUNNING tests/qtest: Remove migration-helpers.c migration/ram: avoid to do log clear in the last round Signed-off-by: Stefan Hajnoczi <[email protected]>

… staging linux-user: fix resource leaks in gen-vdso tcg: Add ptr+ofs alternatives to some gvec functions # -----BEGIN PGP SIGNATURE----- # # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmhZ/LMdHHJpY2hhcmQu # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV8aCggAtZOamQ0+EMe09u9d # slaeZDlmxHYfb4RXJQasIBi/uHoWY1bFCEWqLnjU41cpNqI7B3yihbS/YQzyI1i/ # fqjATmuhDzer7rZfdtmRdiLi6kY9SuN9tcSVMVU/kxixByPxdYspQBO8hAAQMM1X # ZY5MIR/5nEMN/U0QUMuqd3krsxzglGQl9Dn610ddVGfzluSCKLLMS/m92gaJmz0u # xoLTM29lfdtIA29JPpVY+1X8NJ/vTUeBvy2eXUGHjT11rHsYUzMVGCGbzCLluEzN # V3L/aSkiwrV+wW5M7R6+hySQl65ZVRV+E9BHuln9aDnG4jdzT3conohg2cY9a5jw # m3HqnQ== # =U6ub # -----END PGP SIGNATURE----- # gpg: Signature made Mon 23 Jun 2025 21:17:39 EDT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "[email protected]" # gpg: Good signature from "Richard Henderson <[email protected]>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * tag 'pull-tcg-20250623' of https://gitlab.com/rth7680/qemu: linux-user: fix resource leaks in gen-vdso linux-user/aarch64: Update hwcap bits from 6.14 tcg: Split out tcg_gen_gvec_dup_imm_var tcg: Split out tcg_gen_gvec_{add,sub}_var tcg: Split out tcg_gen_gvec_mov_var tcg: Split out tcg_gen_gvec_3_var tcg: Split out tcg_gen_gvec_2_var tcg: Add base arguments to check_overlap_[234] tcg: Add dbase argument to expand_clr tcg: Add dbase argument to do_dup tcg: Add dbase argument to do_dup_store Signed-off-by: Stefan Hajnoczi <[email protected]>

Introduce basic plumbing for vfio-user with CONFIG_VFIO_USER. We introduce VFIOUserContainer in hw/vfio-user/container.c, which is a container type for the "IOMMU" type "vfio-iommu-user", and share some common container code from hw/vfio/container.c. Add hw/vfio-user/pci.c for instantiating VFIOUserPCIDevice objects, sharing some common code from hw/vfio/pci.c. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Introduce the vfio-user "proxy": this is the client code responsible for sending and receiving vfio-user messages across the control socket. The new files hw/vfio-user/proxy.[ch] contain some basic plumbing for managing the proxy; initialize the proxy during realization of the VFIOUserPCIDevice instance. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Add the basic implementation for receiving vfio-user messages from the control socket. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Add plumbing for sending vfio-user messages on the control socket. Add initial version negotation on connection. Originally-by: John Johnson <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: John Levon <[email protected]>

Add support for getting basic device information. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Add support for getting region info for vfio-user. As vfio-user has one fd per region, enable ->use_region_fds. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Re-use PCI setup functions from hw/vfio/pci.c to realize the vfio-user PCI device. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

IRQ setup uses the same semantics as the traditional vfio path, but we need to share the corresponding file descriptors with the server as necessary. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

For vfio-user, the server holds the pending IRQ state; set up an I/O region for the MSI-X PBA so we can ask the server for this state on a PBA read. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

The user container will shortly need access to the underlying vfio-user proxy; set this up. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Hook this call up to the legacy reset handler for vfio-user-pci. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

When the vfio-user container gets mapping updates, share them with the vfio-user by sending a message; this can include the region fd, allowing the server to directly mmap() the region as needed. For performance, we only wait for the message responses when we're doing with a series of updates via the listener_commit() callback. Originally-by: John Johnson <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: John Levon <[email protected]>

Unlike most other messages, this is a server->client message, for when a server wants to do "DMA"; this is slow, so normally the server has memory directly mapped instead. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

By default, the vfio-user subsystem will wait 5 seconds for a message reply from the server. Add an option to allow this to be configurable. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Support an asynchronous send of a vfio-user socket message (no wait for a reply) when the write is posted. This is only safe when no regions are mappable by the VM. Add an option to explicitly disable this as well. Signed-off-by: John Levon <[email protected]>

Add new message to send multiple writes to server in a single message. Prevents the outgoing queue from overflowing when a long latency operation is followed by a series of posted writes. Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

Add some basic documentation on vfio-user usage. Signed-off-by: John Levon <[email protected]>

This patch introduces the vfio-user protocol specification (formerly known as VFIO-over-socket), which is designed to allow devices to be emulated outside QEMU, in a separate process. vfio-user reuses the existing VFIO defines, structs and concepts. It has been earlier discussed as an RFC in: "RFC: use VFIO over a UNIX domain socket to implement device offloading" Signed-off-by: Thanos Makatos <[email protected]> Signed-off-by: John Levon <[email protected]>

bonzini and others added 30 commits June 5, 2025 20:24

util/error: make func optional

e8fb9c9

The function name is not available in Rust, so make it optional. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

rust: qemu-api: add tests for Error bindings

9a33f49

Signed-off-by: Paolo Bonzini <[email protected]>

rust: qdev: support returning errors from realize

4b66abe

Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

hpet: adjust VMState for consistency with Rust version

6e85cfe

No functional change intended. Suggested-by: Zhao Liu <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

hpet: return errors from realize if properties are incorrect

14b5a79

Do not silently adjust num_timers, and fail if intcap is 0. Reviewed-by: Markus Armbruster <[email protected]> Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

rust/hpet: return errors from realize if properties are incorrect

4d2fec8

Match the code in hpet.c; this also allows removing the BqlCell from the num_timers field. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

docs: update Rust module status

9c00ef6

error is new; offset_of is gone. Reviewed-by: Zhao Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

futex: Support Windows

1bc2c49

Windows supports futex-like APIs since Windows 8 and Windows Server 2012. Signed-off-by: Akihiko Odaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

qemu-thread: Use futex if available for QemuLockCnt

0a765ca

This unlocks the futex-based implementation of QemuLockCnt to Windows. Signed-off-by: Akihiko Odaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

qemu-thread: Document QemuEvent

5e2312f

Document QemuEvent to help choose an appropriate synchronization primitive. Signed-off-by: Akihiko Odaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>

Qiangcy and others added 28 commits June 23, 2025 16:03

vfio-user: implement VFIO_USER_REGION_READ/WRITE

68cd225

Originally-by: John Johnson <[email protected]> Signed-off-by: Elena Ufimtseva <[email protected]> Signed-off-by: Jagannathan Raman <[email protected]> Signed-off-by: John Levon <[email protected]>

docs: add vfio-user documentation

49828a5

Add some basic documentation on vfio-user usage. Signed-off-by: John Levon <[email protected]>

oracle-contributor-agreement bot added the OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. label Jun 25, 2025

jlevon closed this Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vfio user.client.v5 #21

Vfio user.client.v5 #21

Uh oh!

jlevon commented Jun 25, 2025

Uh oh!

Uh oh!

Vfio user.client.v5 #21

Vfio user.client.v5 #21

Uh oh!

Conversation

jlevon commented Jun 25, 2025

Uh oh!

Uh oh!