-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Booting VMs on Arm64 with standard Ubuntu images leads to Synchronous Exception
#12211
Booting VMs on Arm64 with standard Ubuntu images leads to Synchronous Exception
#12211
Comments
I'm also seeing this on two arm64 systems with with lxd
|
Thanks for experimenting with that, Simon! I've checked snap for Similar issues: Discussion in edk2 lists: Possible root cause (issue in shim): I'm not sure (yet) what we want to do with this. I can remove |
thanks @mihalicyn is that something which may be addressed in the forthcoming edk2 release do you know, or is it still being discussed? |
Unfortunately discussion in https://edk2.groups.io/g/devel/topic/99631663#106181 does not look active since June 2023. Upd: Upd2: @julian-klode said that this issue will be fixed in ubuntu shim in the next few month. |
Ok. Everything is clear now.
=> I'll prepare workaround for our edk2 by myself. |
Unfortunately, we have to disable EFI memory attributes protocol that was introduced in tianocore/edk2@1c4dfad starting from edk2-stable202305 as it leads to crash of SecureBoot shim There is a fix for shim that addresses this issue, but it will take a few month until this fix will be landed to different Linux distros and we can't make our users wait for it. Fix for shim: rhboot/shim@c7b3051 canonical/lxd#12211 Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Thank you! This is now working with |
I'll include this in the mid-release dependency updates for LXD 5.17 in latest/stable. |
I apologise, I made a mistake while testing (
|
@IsaacJT it looks like you miss
on an arm64 system running the LXD snap at
this boots the VM just fine without any errors. If you miss the |
Simon, huge thanks for checking!
So, disabling @IsaacJT reached me privately and shown that even without secure boot it does not work for him:
(I've asked additionally to check Fedora image). |
That's interesting - it does work with Not working with secure boot enabled is definitely a regression though - it works on |
Interestingly, Debian sid also works:
And also with secure boot:
|
Secure boot working on
Fedora 38 working fine on
|
Ok, it means that disabling But we have something else in edk2 that is broken. Unfortunately I can't bisect issue as I don't have an ARM machine at hands (probably I need to buy one), I'll try to analyze what was changed in edk2 that can affect on ARM VMs this way. |
Ok. Iteration number 2. |
Could @morphis get you access to an ARM machine temporarily? :) |
Simon said that we have some internal thing called scalingstack that allows to get an ARM machine or VM. |
scalingstack is an OpenStack so you'd get VMs out of it which won't help you as most Arm systems don't do nested virtualization |
Secure boot looks broken starting from Tested with
Continuing investigation. |
Bad commit is tianocore/edk2@2997ae3:
Will try to experiment with combo of the recent edk2 and revert of this commit. |
Upd: I can confirm that |
Hurrah! Does that mean we can re-enable the other feature that was turned off before or do they both need to be reverted? |
yes, because reverting this makes NX protection to be disabled completely (from the firmware side). I have not dived into the code deeply, but it works without reverting |
Revert ("ArmVirtPkg: make EFI_LOADER_DATA non-executable") from edk2: tianocore/edk2@2997ae3 this commit breaks secure boot completely and also affects non-secure boot systems Old shim, grub2 versions, linux kernel versions are not compatible with this feature and effectively it breaks almost everything on arm64. Fixes canonical/lxd#12211 Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Unfortunately, we have to disable EFI memory attributes protocol that was introduced in tianocore/edk2@1c4dfad starting from edk2-stable202305 as it leads to crash of SecureBoot shim There is a fix for shim that addresses this issue, but it will take a few month until this fix will be landed to different Linux distros and we can't make our users wait for it. Fix for shim: rhboot/shim@c7b3051 This commit was reverted as a part of ("edk2: disable NX protection feature") but it was a mistake. Somehow, I made a systematic error in my test of edk2 with ("edk2: disable EFI memory attributes protocol") reverted and found that it works. But it's not. Most likely I've just forgot to rebuild edk2 or something... Just to be clear EFI_MEMORY_ATTRIBUTES protocol and "ArmVirtPkg: make EFI_LOADER_DATA non-executable" are about setting NX flags on some pages on arm64. And both of commits led to regressions but on the *different* stages of boot process. I. "ArmVirtPkg: make EFI_LOADER_DATA non-executable" makes boot process to fail with Synchronous Exception *after* efi-shim/grub2 finished their work: ================ BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) Welcome to GRUB! Booting `openSUSE Leap 15.5' Loading Linux 5.14.21-150500.55.19-default ... Loading initial ramdisk ... Synchronous Exception at 0x000000006C217504 ================ II. EFI_MEMORY_ATTRIBUTES ("ArmPkg/CpuDxe: Implement EFI memory attributes protocol") makes shim (!) to fail with Synchronous Exception like that: ================ BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) Synchronous Exception at 0x000000007C318000 Synchronous Exception at 0x000000007C318000 ================ Fixes canonical/lxd#12211 Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Unfortunately, we have to disable EFI memory attributes protocol that was introduced in tianocore/edk2@1c4dfad starting from edk2-stable202305 as it leads to crash of SecureBoot shim There is a fix for shim that addresses this issue, but it will take a few month until this fix will be landed to different Linux distros and we can't make our users wait for it. Fix for shim: rhboot/shim@c7b3051 This commit was reverted as a part of ("edk2: disable NX protection feature") but it was a mistake. Somehow, I made a systematic error in my test of edk2 with ("edk2: disable EFI memory attributes protocol") reverted and found that it works. But it's not. And I found *why* I made this mistake. Just to be clear EFI_MEMORY_ATTRIBUTES protocol and "ArmVirtPkg: make EFI_LOADER_DATA non-executable" are about setting NX flags on some pages on arm64. And both of commits led to regressions but on the *different* stages of boot process. I. "ArmVirtPkg: make EFI_LOADER_DATA non-executable" makes boot process to fail with Synchronous Exception *after* efi-shim/grub2 finished their work: ================ BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) Welcome to GRUB! Booting `openSUSE Leap 15.5' Loading Linux 5.14.21-150500.55.19-default ... Loading initial ramdisk ... Synchronous Exception at 0x000000006C217504 ================ II. EFI_MEMORY_ATTRIBUTES ("ArmPkg/CpuDxe: Implement EFI memory attributes protocol") makes shim (!) to fail with Synchronous Exception like that: ================ BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) Synchronous Exception at 0x000000007C318000 Synchronous Exception at 0x000000007C318000 ================ Now about *how* I made this mistake during testing. $ lxc launch ubuntu:jammy jammy-secboot1 --vm -c security.secureboot=true --console Synchronous Exception $ lxc stop jammy-secboot1 --force $ ./replace_firmware.sh $ lxc start jammy-secboot1 --console Everything works! $ lxc stop jammy-secboot1 --force $ ./revert_firmware.sh $ lxc start jammy-secboot1 --console Everything is still working! The catch here is that Synchronous Exception that happens in shim happens only on a clean NVRAM! If VM was boot successfully one time, then it will boot successfuly even after upgrade to a new firmware. (only about EFI_MEMORY_ATTRIBUTE protocol thing!) Fixes canonical/lxd#12211 Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Dear colleagues, since now (lxd snap revision 25674) Secure Boot/non-Secure Boot VMs should work perfectly on arm64 machines. There were two problems (and two edk2 commits were reverted) which led to the same observable behavior (Synchronous Exception). Both problems are connected with NX flag but stage of the boot process where failure happens is different. One issue happens in shim, another issue happens during Linux kernel early boot. https://launchpad.net/~canonical-lxd/+snap/lxd-latest-edge/+build/2235019 |
Unfortunately, we have to disable EFI memory attributes protocol that was introduced in tianocore/edk2@1c4dfad starting from edk2-stable202305 as it leads to crash of SecureBoot shim There is a fix for shim that addresses this issue, but it will take a few month until this fix will be landed to different Linux distros and we can't make our users wait for it. Fix for shim: rhboot/shim@c7b3051 canonical/lxd#12211 Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Revert ("ArmVirtPkg: make EFI_LOADER_DATA non-executable") from edk2: tianocore/edk2@2997ae3 this commit breaks secure boot completely and also affects non-secure boot systems Old shim, grub2 versions, linux kernel versions are not compatible with this feature and effectively it breaks almost everything on arm64. Fixes canonical/lxd#12211 Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Unfortunately, we have to disable EFI memory attributes protocol that was introduced in tianocore/edk2@1c4dfad starting from edk2-stable202305 as it leads to crash of SecureBoot shim There is a fix for shim that addresses this issue, but it will take a few month until this fix will be landed to different Linux distros and we can't make our users wait for it. Fix for shim: rhboot/shim@c7b3051 This commit was reverted as a part of ("edk2: disable NX protection feature") but it was a mistake. Somehow, I made a systematic error in my test of edk2 with ("edk2: disable EFI memory attributes protocol") reverted and found that it works. But it's not. And I found *why* I made this mistake. Just to be clear EFI_MEMORY_ATTRIBUTES protocol and "ArmVirtPkg: make EFI_LOADER_DATA non-executable" are about setting NX flags on some pages on arm64. And both of commits led to regressions but on the *different* stages of boot process. I. "ArmVirtPkg: make EFI_LOADER_DATA non-executable" makes boot process to fail with Synchronous Exception *after* efi-shim/grub2 finished their work: ================ BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) Welcome to GRUB! Booting `openSUSE Leap 15.5' Loading Linux 5.14.21-150500.55.19-default ... Loading initial ramdisk ... Synchronous Exception at 0x000000006C217504 ================ II. EFI_MEMORY_ATTRIBUTES ("ArmPkg/CpuDxe: Implement EFI memory attributes protocol") makes shim (!) to fail with Synchronous Exception like that: ================ BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1) Synchronous Exception at 0x000000007C318000 Synchronous Exception at 0x000000007C318000 ================ Now about *how* I made this mistake during testing. $ lxc launch ubuntu:jammy jammy-secboot1 --vm -c security.secureboot=true --console Synchronous Exception $ lxc stop jammy-secboot1 --force $ ./replace_firmware.sh $ lxc start jammy-secboot1 --console Everything works! $ lxc stop jammy-secboot1 --force $ ./revert_firmware.sh $ lxc start jammy-secboot1 --console Everything is still working! The catch here is that Synchronous Exception that happens in shim happens only on a clean NVRAM! If VM was boot successfully one time, then it will boot successfuly even after upgrade to a new firmware. (only about EFI_MEMORY_ATTRIBUTE protocol thing!) Fixes canonical/lxd#12211 Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Thanks for your hard work! I can confirm that this is now working with rev
|
Required information
6.2.0-31-generic #31~22.04.1-Ubuntu
Issue description
Booting a VM on an Arm64 system with the standard Ubuntu images is broken on latest 5.17. I was able to reproduce this on 5.15 and 5.16. First version not affected is 5.14 (from
5.14/candidate
as there is no5.14/stable
for some reason). VMs are failing with aSynchronous Exception
right after VM start and when it tries to boot from disk:Interestingly the issue does not exist when UEFI is trying to do a network boot via MAAS. Also the same issue does not exist on x86.
Steps to reproduce
snap install lxd --channel=5.17/stable ; lxd init --auto
lxc launch ubuntu:j j0 --vm -c security.secureboot=false
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue)The text was updated successfully, but these errors were encountered: