Skip to content

Conversation

mayank-microsoft
Copy link
Contributor

@mayank-microsoft mayank-microsoft commented Apr 21, 2025

OpenTMK framework for testing guest-based scenarios with a HCL.

The above diagram illustrates relation between abstract modules.

image

UEFI Executor Design Decisions:

  1. Allocator
    a. The allocator today switches between UEFI Runtime Allocator and LockedHeapAllocator.
    b. The decision to switch between the two allocator is to allow more control over which sections of the memory map is for the heap, this is helpful so that we know we are using a memory section which will not be used by UEFI runtime services after exit boot services. Using the UEFI allocator is important so that we can allocate any object before we call main. If a panic occurs before main, we need to allocate strings in the Panic handler. UEFI allocator can’t be used after exit boot services.
  2. Panic Handler
    a. The panic handler today logs the panic info as string using the logger module and then loops. The test driver is informed of the panic and the test driver terminates the VM.
    b. Improvement planned to shutdown the VM.
    c. Today we use our own interrupt handler, using ud2 causes an fainterrupt but that does not cause a triple fault.
  3. Test Configuration Handler
    a. In scope for a task being tracked

Platform Design Decisions

ARM64 implementation is a placeholder and out of scope, the work is tacked by . The work is mostly around implementing Interrupt handling, VP bring up (just the implementation for default context), TPM specific changes and end-to-end testing.

  1. HvCall
    a. Platform/hyperv/arch houses all the modules which require a platform specific implementation.
    b. VTL calls/return need to be handled carefully, as many of the general-purpose register values are not preserved across VTL switch. The requires us to push all the values to stack before a switch and restore back when we return. We also need to handle this carefully when VTL switch happens because of secure intercepts.
    c. Tests which require for secure intercepts to happen must use macro: create_function_with_restore to isolate the violating function.
  2. Hyper-V platform test context implementation
    a. Today we hardcode the value for how many VPs are present. Earlier I had tried constructing the heuristics to read the CPU topology from CPU-ID but they returned differently for Intel and AMD. I intend to use ACPI table to construct this information/take the values as input in test configuration in the next set of improvements.
    b. The AP bring up in start_on_vp takes care of everything related to enabling the VTLs and scheduling the VpExecutor object. Working on changing the name as suggested in the PR. This change is mostly for simplicity, for complex tests where the heuristic has to be tested for boundary testing I recommend authoring a test with direct dependency on platform and calling the hypercall interface (HvCall is a pub field in HvTestCtx) without using the generic interface of the platform traits.
  3. X86_64 Interrupt Management
    a. We depend on the x86_64 crate to provide structure and helpers, along with x86-interrupt ABI.
    b. Since custom ABI is a nightly feature, we keep the feature behind the nightly feature flag.
    c. We are tracking a task to move to naked functions as a part of the improvements.
  4. TPM
    a. We currently use a duplicated module of protocol module from tpm crate, we can’t depend on tpm crate since it links to openssl which we want to avoid, apart from that we we can’t readily move the protocol because of some coupling between the protocol module and the errors struct from tpm ref crate. I’ll work on decoupling the modules once we are ok with other changes in this PR. I feel it may be better to take the decoupling in a follow up PR, since there are a lot of changes in this PR, isolating the PR to not touch TPM implementation will help reduce risk of breaking anything in the TPM crate.
  5. Serial Port on AMD64
    a. We have a separate implementation which is building on top of minimal_rt, the major reasons are to facilitate multiple process writing logs at the same time (by implementing locks) and to write to COM1/COM2 instead of the default COM3.

feat: opentmk init

feat: opentmk init

feat: opentmk init

feat: opentmk init

feat: opentmk init

feat: opentmk init

feat: init 1

feat: init 2

feat: init 1

feat: opentmk

feat: opentmk init 3

feat: opentmk init 4

feat: opentmk init 4
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

#![expect(unsafe_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please annotate why we have unsafe in each file we allow it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed it as a part of lint fixes. maybe it's an older commit?

@smalis-msft smalis-msft added the release-ci-required Add to a PR to trigger PR gates in release mode label Oct 13, 2025
// implement clone for Sender
impl<T> Clone for Sender<T> {
fn clone(&self) -> Self {
self.inner.senders.fetch_add(1, Ordering::SeqCst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all these orderings need to be SeqCst? I think we could get away with Relaxed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding some unit tests based on the loom crate to test that out (see atomic_ringbuf for examples)

pub mod hv_memory_protect_write;
pub mod hv_processor;
#[cfg(nightly)]
#[cfg(target_arch = "x86_64")] // xtask-fmt allow-target-arch sys-crate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding opentmk as one of the exceptions to the xtask fmt requirement here. If you look at the source of it there's a hardcoded list of exception crates. Then you won't need the comments everywhere.

Copy link
Contributor

@smalis-msft smalis-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a lot of follow-up work to do here, and a lot of open questions and comments still, but I'd like to get this merged so we can begin working on integrating these tests into CI while we also work on iterating on the still open questions. I think we can continue to use this PR as a place for discussion on this code after it gets merged, and file followup tasks as appropriate.

Copy link

@mayank-microsoft
Copy link
Contributor Author

The test failure was intermittent. It passes on rerun.

@smalis-msft
Copy link
Contributor

Yeah, we're working on those.

smalis-msft
smalis-msft previously approved these changes Oct 15, 2025
@mayank-microsoft
Copy link
Contributor Author

Some changes for serde in some crates triggered an approval request from microsoft/openvmm-vtl2-settings-approvers. We will need an approval from them as well.

@smalis-msft
Copy link
Contributor

I'll get a bypass.

@benhillis benhillis requested a review from Copilot October 16, 2025 15:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 74 out of 76 changed files in this pull request and generated 18 comments.

@smalis-msft smalis-msft removed the release-ci-required Add to a PR to trigger PR gates in release mode label Oct 16, 2025
static COMMON_HANDLER_MUTEX: Mutex<()> = Mutex::new(());

#[unsafe(no_mangle)]
#[no_mangle]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't no_mangle an unsafe attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, read up more about it now. with edition 2024, it should be marked unsafe.

smalis-msft
smalis-msft previously approved these changes Oct 16, 2025
@benhillis benhillis enabled auto-merge (squash) October 16, 2025 18:54
@benhillis benhillis merged commit db5d175 into microsoft:main Oct 16, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants