Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Booting VMs on Arm64 with standard Ubuntu images leads to Synchronous Exception #12211

Closed
6 tasks
morphis opened this issue Sep 4, 2023 · 27 comments · Fixed by canonical/lxd-pkg-snap#144, canonical/lxd-pkg-snap#147 or canonical/lxd-pkg-snap#153
Assignees
Labels
Bug Confirmed to be a bug
Milestone

Comments

@morphis
Copy link
Contributor

morphis commented Sep 4, 2023

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04
  • The output of "lxc info" or if that fails:
    • Kernel version: 6.2.0-31-generic #31~22.04.1-Ubuntu
    • LXC version: 5.0.3
    • LXD version: 5.17-e5ead86
    • Storage backend in use: dir

Issue description

Booting a VM on an Arm64 system with the standard Ubuntu images is broken on latest 5.17. I was able to reproduce this on 5.15 and 5.16. First version not affected is 5.14 (from 5.14/candidate as there is no 5.14/stable for some reason). VMs are failing with a Synchronous Exception right after VM start and when it tries to boot from disk:

BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)


Synchronous Exception at 0x000000007C318000


Synchronous Exception at 0x000000007C318000

Interestingly the issue does not exist when UEFI is trying to do a network boot via MAAS. Also the same issue does not exist on x86.

Steps to reproduce

  1. Install LXD and initialize snap install lxd --channel=5.17/stable ; lxd init --auto
  2. Launch a VM and attach to the console lxc launch ubuntu:j j0 --vm -c security.secureboot=false

Information to attach

  • Any relevant kernel output (dmesg)
  • Container log (lxc info NAME --show-log)
  • Container configuration (lxc config show NAME --expanded)
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)
@tomponline tomponline added the Bug Confirmed to be a bug label Sep 4, 2023
@tomponline tomponline added this to the lxd-5.18 milestone Sep 4, 2023
@IsaacJT
Copy link

IsaacJT commented Sep 4, 2023

I'm also seeing this on two arm64 systems with with lxd 5.17-e5ead86 on Ubuntu Core 22 with 5.15 kernel. Storage backend is btrfs on one system, and zfs on the other.

5.14/candidate is also the latest version that works for me.

@mihalicyn
Copy link
Member

First version not affected is 5.14 (from 5.14/candidate as there is no 5.14/stable for some reason).

Thanks for experimenting with that, Simon!

I've checked snap for 5.14-7072c7b and it looks like it's caused by an update to edk2 edk2-stable202305.

Similar issues:
pftf/RPi4#235

Discussion in edk2 lists:
https://edk2.groups.io/g/devel/topic/99631663#106181

Possible root cause (issue in shim):
microsoft/mu_silicon_arm_tiano#124 (comment)
rhboot/shim@c7b3051

I'm not sure (yet) what we want to do with this. I can remove EFI_MEMORY_ATTRIBUTE_PROTOCOL feature by my hands from our edk2 firmware (as upstream does not provide us with any option for that, at this moment). Alternatively, we can try to ask our friends from Canonical who are working on Ubuntu Cloud images to ship the shim version with rhboot/shim@c7b3051 applied.

@tomponline
Copy link
Member

thanks @mihalicyn is that something which may be addressed in the forthcoming edk2 release do you know, or is it still being discussed?

@mihalicyn
Copy link
Member

mihalicyn commented Sep 6, 2023

Unfortunately discussion in https://edk2.groups.io/g/devel/topic/99631663#106181 does not look active since June 2023.
I'll try to ping our friends from edk2 and ask about this tianocore/edk2#4560 (comment).

Upd:
I've also reached @julian-klode to ask if it's planned to backport rhboot/shim@c7b3051 to the Ubuntu's shim package.

Upd2:

@julian-klode said that this issue will be fixed in ubuntu shim in the next few month.

@mihalicyn
Copy link
Member

mihalicyn commented Sep 7, 2023

Ok. Everything is clear now.

  1. we need to wait a few month to get shim updated
  2. edk2 won't be workarounded anyhow ([TEST] ArmPkg: Add Pcd to disable EFI_MEMORY_ATTRIBUTE_PROTOCOL tianocore/edk2#4560 (comment))

=> I'll prepare workaround for our edk2 by myself.

mihalicyn added a commit to mihalicyn/lxd-pkg-snap that referenced this issue Sep 7, 2023
Unfortunately, we have to disable EFI memory attributes
protocol that was introduced in
tianocore/edk2@1c4dfad
starting from edk2-stable202305 as it leads to crash of SecureBoot shim

There is a fix for shim that addresses this issue, but it will take a few
month until this fix will be landed to different Linux distros and we can't
make our users wait for it.
Fix for shim:
rhboot/shim@c7b3051

canonical/lxd#12211

Signed-off-by: Alexander Mikhalitsyn <[email protected]>
@IsaacJT
Copy link

IsaacJT commented Sep 11, 2023

Thank you! This is now working with latest/edge

@tomponline
Copy link
Member

I'll include this in the mid-release dependency updates for LXD 5.17 in latest/stable.

@IsaacJT
Copy link

IsaacJT commented Sep 13, 2023

I apologise, I made a mistake while testing (snap refresh --edge lxd just switches to edge within the same track, not to latest/edge... I should have known that), and this is not fixed in latest/edge: git-f9db8d5 2023-09-13 (25621) . I'm still seeing the same issue:

itrue@odroid:~$ sudo lxc launch ubuntu:jammy/arm64 --vm kiwi  -c limits.memory=3GB -c limits.cpu=4 -d root,size=30GiB  --console
Creating kiwi
Starting kiwi
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)


Synchronous Exception at 0x00000000E9029E10


Synchronous Exception at 0x00000000E9029E10


@mihalicyn
Copy link
Member

Hi @IsaacJT

That's interesting. Then we need extra confirmation from Simon Fels about this.
cc @morphis

@morphis
Copy link
Contributor Author

morphis commented Sep 14, 2023

@IsaacJT it looks like you miss -c security.secureboot=false on your launch command line. Using

lxc launch ubuntu:jammy/arm64 --vm kiwi -c limits.memory=3GB -c limits.cpu=4 -c security.secureboot=false -d root,size=30GiB --console

on an arm64 system running the LXD snap at

root@c0:~# snap list lxd
Name  Version      Rev    Tracking     Publisher   Notes
lxd   git-288dac4  25625  latest/edge  canonical✓  -

this boots the VM just fine without any errors. If you miss the -c security.secureboot=false option you will run into the Synchronous Exception and that is expected (not necessarily the best user experience though).

@mihalicyn
Copy link
Member

mihalicyn commented Sep 14, 2023

Simon, huge thanks for checking!

If you miss the -c security.secureboot=false option you will run into the Synchronous Exception and that is expected (not necessarily the best user experience though).

So, disabling security.secureboot was a requirement a long time ago? That's weird. It should work in all modes, I believe.

@IsaacJT reached me privately and shown that even without secure boot it does not work for him:

$ sudo lxc launch images:fedora/38 --vm  -c limits.memory=3GB -c limits.cpu=4 -d root,size=30GiB  -c security.secureboot=false --console
Creating the instance
Instance name is: driving-bream
Starting driving-bream
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
error: ../../grub-core/fs/fshelp.c:257:file `/EFI/fedora/grubenv' not found.
  Booting `Fedora Linux (6.4.14-200.fc38.aarch64) 38 (Container Image)'



Synchronous Exception at 0x00000000EB5AECEC


Synchronous Exception at 0x00000000EB5AECEC

(I've asked additionally to check Fedora image).

@IsaacJT
Copy link

IsaacJT commented Sep 14, 2023

That's interesting - it does work with -c security.secureboot=false on Ubuntu images, however like @mihalicyn mentioned it's still broken on other images.

Not working with secure boot enabled is definitely a regression though - it works on 5.14/candidate without any issues.

@IsaacJT
Copy link

IsaacJT commented Sep 14, 2023

itrue@odroid:~$ sudo lxc launch ubuntu:jammy --vm -c security.secureboot=false --console
Creating the instance
Instance name is: sharing-pigeon
Starting sharing-pigeon
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd050]
[    0.000000] Linux version 5.15.0-83-generic (buildd@bos02-arm64-001) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #92-Ubuntu SMP Mon Aug 14 09:34:05 UTC 2023 (Ubunt
u 5.15.0-83.92-generic 5.15.116)
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: SMBIOS 3.0=0x7fed0000 MEMATTR=0x7eb51018 ACPI 2.0=0x7c420018 MOKvar=0x7fdb0000 RNG=0x7c42e718 MEMRESERVE=0x7c23cf18
[    0.000000] random: crng init done
...
itrue@odroid:~$ sudo lxc launch images:fedora/38 --vm -c security.secureboot=false --console
Creating the instance
Instance name is: whole-guinea
Starting whole-guinea
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
error: ../../grub-core/fs/fshelp.c:257:file `/EFI/fedora/grubenv' not found.
  Booting `Fedora Linux (6.4.14-200.fc38.aarch64) 38 (Container Image)'



Synchronous Exception at 0x00000000788AECEC


Synchronous Exception at 0x00000000788AECEC

itrue@odroid:~$ sudo lxc launch images:opensuse/15.5 --vm -c security.secureboot=false --console
Creating the instance
Instance name is: pretty-teal
Starting pretty-teal
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
Welcome to GRUB!
  Booting `openSUSE Leap 15.5'

Loading Linux 5.14.21-150500.55.19-default ...
Loading initial ramdisk ...


Synchronous Exception at 0x000000006C217504


Synchronous Exception at 0x000000006C217504

Interestingly, Debian sid also works:

itrue@odroid:~$ sudo lxc launch images:debian/sid --vm -c security.secureboot=false --console
Creating the instance
Instance name is: united-thrush
Starting united-thrush
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
  Booting `Debian GNU/Linux'

Loading Linux 6.4.0-4-arm64 ...
Loading initial ramdisk ...
[   11.137361] platform regulatory.0: firmware: failed to load regulatory.db (-2)
[   11.167006] firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
[   11.229209] platform regulatory.0: firmware: failed to load regulatory.db (-2)

Debian GNU/Linux trixie/sid united-thrush ttyAMA0

united-thrush login:

And also with secure boot:

itrue@odroid:~$ sudo lxc launch images:debian/sid --vm -c security.secureboot=true --console
Creating the instance
Instance name is: humorous-bullfrog
Starting humorous-bullfrog
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
error: prohibited by secure boot policy.
  Booting `Debian GNU/Linux'

Loading Linux 6.4.0-4-arm64 ...
Loading initial ramdisk ...
[    9.629883] platform regulatory.0: firmware: failed to load regulatory.db (-2)
[    9.633284] firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
[    9.699572] platform regulatory.0: firmware: failed to load regulatory.db (-2)


@IsaacJT
Copy link

IsaacJT commented Sep 14, 2023

Secure boot working on 5.14/candidate:

itrue@odroid:~$ sudo snap refresh --channel 5.14/candidate lxd
2023-09-14T12:41:27Z INFO Waiting for "snap.lxd.daemon.service" to stop.
lxd (5.14/candidate) 5.14-7072c7b from Canonical✓ refreshed
itrue@odroid:~$ sudo snap restart lxd
2023-09-14T12:42:05Z INFO Waiting for "snap.lxd.daemon.service" to stop.
Restarted.
itrue@odroid:~$ sudo lxc launch ubuntu:jammy --vm -c security.secureboot=true --console
Creating the instance
Instance name is: enormous-gar
Starting enormous-gar
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0002 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0002 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
EFI stub: Booting Linux Kernel...
EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary
EFI stub: UEFI Secure Boot is enabled.
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
EFI stub: UEFI Secure Boot is enabled.
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd050]
[    0.000000] Linux version 5.15.0-83-generic (buildd@bos02-arm64-001) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #92-Ubuntu SMP Mon Aug 14 09:34:05 UTC 2023 (Ubunt
u 5.15.0-83.92-generic 5.15.116)
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: SMBIOS 3.0=0x7bed0000 MEMATTR=0x7a710018 ACPI 2.0=0x78420018 MOKvar=0x7bdb0000 RNG=0x7842e718 MEMRESERVE=0x78326018
[    0.000000] random: crng init done
[    0.000000] secureboot: Secure boot enabled

Fedora 38 working fine on 5.14/candidate too (with secureboot disabled as the image doesn't support it):

itrue@odroid:~$ sudo lxc launch images:fedora/38 --vm -c security.secureboot=false --console
Creating the instance
Instance name is: welcome-gecko
Starting welcome-gecko
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0002 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0002 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
error: ../../grub-core/fs/fshelp.c:257:file `/EFI/fedora/grubenv' not found.
  Booting `Fedora Linux (6.4.14-200.fc38.aarch64) 38 (Container Image)'

EFI stub: Decompressing Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd050]
[    0.000000] Linux version 6.4.14-200.fc38.aarch64 (mockbuild@0c2f46b11b0f429d98b2fbbd14888c0e) (gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), GNU ld version 2.39-9.fc38) #1 SMP PREEMPT_DYNAMIC Sat Sep
2 16:28:14 UTC 2023
[    0.000000] efi: EFI v2.7 by EDK II
[    0.000000] efi: SMBIOS 3.0=0x7bed0000 MEMATTR=0x7a710518 ACPI 2.0=0x78420018 MOKvar=0x7bdb0000 RNG=0x7842e718 MEMRESERVE=0x7834a818
[    0.000000] random: crng init done

@mihalicyn
Copy link
Member

Ok, it means that disabling EFI_MEMORY_ATTRIBUTE_PROTOCOL helps to workaround issue in shim (rhboot/shim@c7b3051) which is present in Ubuntu.

But we have something else in edk2 that is broken.

Unfortunately I can't bisect issue as I don't have an ARM machine at hands (probably I need to buy one), I'll try to analyze what was changed in edk2 that can affect on ARM VMs this way.

@mihalicyn mihalicyn reopened this Sep 14, 2023
@mihalicyn
Copy link
Member

Ok. Iteration number 2.

@tomponline
Copy link
Member

Could @morphis get you access to an ARM machine temporarily? :)

@mihalicyn
Copy link
Member

Could @morphis get you access to an ARM machine temporarily? :)

Simon said that we have some internal thing called scalingstack that allows to get an ARM machine or VM.

@stgraber
Copy link
Contributor

scalingstack is an OpenStack so you'd get VMs out of it which won't help you as most Arm systems don't do nested virtualization

@stgraber
Copy link
Contributor

ssh [email protected] -J [email protected]

@mihalicyn
Copy link
Member

Secure boot looks broken starting from edk2-stable202211. edk2-stable202208 works perfectly well.

Tested with ubuntu:jammy on Raspberry PI 4 inside LXD VM (huge thanks to @stgraber for providing it):

edk2-stable202208
===========================================

BdsDxe: loading Boot0008 "ubuntu" from HD(15,GPT,0FA01584-C9CE-4C03-A58A-DD3EC06D24A4,0x800,0x31801)/\EFI\ubuntu\shimaa64.efi
BdsDxe: starting Boot0008 "ubuntu" from HD(15,GPT,0FA01584-C9CE-4C03-A58A-DD3EC06D24A4,0x800,0x31801)/\EFI\ubuntu\shimaa64.efi
EFI stub: Booting Linux Kernel...
EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary
EFI stub: UEFI Secure Boot is enabled.
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
EFI stub: UEFI Secure Boot is enabled.
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] Linux version 5.15.0-83-generic (buildd@bos02-arm64-001) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #92-Ubuntu SMP Mon Aug 14 09:34:05 UTC 2023 (Ubuntu 5.15.0-83.92-generic 5.15.116)
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: SMBIOS 3.0=0x7bed0000 MEMATTR=0x7a7bc018 ACPI 2.0=0x78420018 MOKvar=0x7bdc0000 RNG=0x7842e718 MEMRESERVE=0x78348218 
[    0.000000] random: crng init done
[    0.000000] secureboot: Secure boot enabled
[    0.000000] Kernel is locked down from EFI Secure Boot mode; see man kernel_lockdown.7
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x0000000078420018 000024 (v02 BOCHS )

edk2-stable202211
===========================================

BdsDxe: loading Boot0008 "ubuntu" from HD(15,GPT,0FA01584-C9CE-4C03-A58A-DD3EC06D24A4,0x800,0x31801)/\EFI\ubuntu\shimaa64.efi
BdsDxe: starting Boot0008 "ubuntu" from HD(15,GPT,0FA01584-C9CE-4C03-A58A-DD3EC06D24A4,0x800,0x31801)/\EFI\ubuntu\shimaa64.efi


Synchronous Exception at 0x000000007643CFD0


Synchronous Exception at 0x000000007643CFD0


Continuing investigation.

@mihalicyn
Copy link
Member

Bad commit is tianocore/edk2@2997ae3:

commit 2997ae38739756ecba9b0de19e86032ebc689ef9
Author: Ard Biesheuvel <[email protected]>
Date:   Tue Aug 2 11:48:04 2022 +0200

    ArmVirtPkg: make EFI_LOADER_DATA non-executable
    
    When the memory protections were implemented and enabled on ArmVirtQemu
    5+ years ago, we had to work around the fact that GRUB at the time
    expected EFI_LOADER_DATA to be executable, as that is the memory type it
    allocates when loading its modules.
    
    This has been fixed in GRUB in August 2017, so by now, we should be able
    to tighten this, and remove execute permissions from EFI_LOADER_DATA
    allocations.
    
    Signed-off-by: Ard Biesheuvel <[email protected]>

Will try to experiment with combo of the recent edk2 and revert of this commit.

@mihalicyn
Copy link
Member

Upd: I can confirm that edk2-stable202305 works fine with tianocore/edk2@2997ae3 reverted.

@tomponline
Copy link
Member

Hurrah! Does that mean we can re-enable the other feature that was turned off before or do they both need to be reverted?

@mihalicyn
Copy link
Member

Hurrah! Does that mean we can re-enable the other feature that was turned off before or do they both need to be reverted?

yes, because reverting this makes NX protection to be disabled completely (from the firmware side). I have not dived into the code deeply, but it works without reverting EFI_MEMORY_ATTRIBUTE_PROTOCOL.

mihalicyn added a commit to mihalicyn/lxd-pkg-snap that referenced this issue Sep 15, 2023
Revert ("ArmVirtPkg: make EFI_LOADER_DATA non-executable") from edk2:
tianocore/edk2@2997ae3
this commit breaks secure boot completely and also affects non-secure boot systems

Old shim, grub2 versions, linux kernel versions are not compatible
with this feature and effectively it breaks almost everything on arm64.

Fixes canonical/lxd#12211

Signed-off-by: Alexander Mikhalitsyn <[email protected]>
mihalicyn added a commit to mihalicyn/lxd-pkg-snap that referenced this issue Sep 19, 2023
Unfortunately, we have to disable EFI memory attributes
protocol that was introduced in
tianocore/edk2@1c4dfad
starting from edk2-stable202305 as it leads to crash of SecureBoot shim

There is a fix for shim that addresses this issue, but it will take a few
month until this fix will be landed to different Linux distros and we can't
make our users wait for it.
Fix for shim:
rhboot/shim@c7b3051

This commit was reverted as a part of ("edk2: disable NX protection feature")
but it was a mistake. Somehow, I made a systematic error in my test of
edk2 with ("edk2: disable EFI memory attributes protocol") reverted and
found that it works. But it's not. Most likely I've just forgot to rebuild edk2
or something...

Just to be clear EFI_MEMORY_ATTRIBUTES protocol and "ArmVirtPkg: make EFI_LOADER_DATA non-executable"
are about setting NX flags on some pages on arm64. And both of commits led
to regressions but on the *different* stages of boot process.

I. "ArmVirtPkg: make EFI_LOADER_DATA non-executable" makes boot process
to fail with Synchronous Exception *after* efi-shim/grub2 finished their work:
================
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
Welcome to GRUB!
  Booting `openSUSE Leap 15.5'

Loading Linux 5.14.21-150500.55.19-default ...
Loading initial ramdisk ...

Synchronous Exception at 0x000000006C217504
================

II. EFI_MEMORY_ATTRIBUTES ("ArmPkg/CpuDxe: Implement EFI memory attributes protocol")
makes shim (!) to fail with Synchronous Exception like that:
================
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)

Synchronous Exception at 0x000000007C318000

Synchronous Exception at 0x000000007C318000
================

Fixes canonical/lxd#12211

Signed-off-by: Alexander Mikhalitsyn <[email protected]>
mihalicyn added a commit to mihalicyn/lxd-pkg-snap that referenced this issue Sep 19, 2023
Unfortunately, we have to disable EFI memory attributes
protocol that was introduced in
tianocore/edk2@1c4dfad
starting from edk2-stable202305 as it leads to crash of SecureBoot shim

There is a fix for shim that addresses this issue, but it will take a few
month until this fix will be landed to different Linux distros and we can't
make our users wait for it.
Fix for shim:
rhboot/shim@c7b3051

This commit was reverted as a part of ("edk2: disable NX protection feature")
but it was a mistake. Somehow, I made a systematic error in my test of
edk2 with ("edk2: disable EFI memory attributes protocol") reverted and
found that it works. But it's not. And I found *why* I made this mistake.

Just to be clear EFI_MEMORY_ATTRIBUTES protocol and "ArmVirtPkg: make EFI_LOADER_DATA non-executable"
are about setting NX flags on some pages on arm64. And both of commits led
to regressions but on the *different* stages of boot process.

I. "ArmVirtPkg: make EFI_LOADER_DATA non-executable" makes boot process
to fail with Synchronous Exception *after* efi-shim/grub2 finished their work:
================
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
Welcome to GRUB!
  Booting `openSUSE Leap 15.5'

Loading Linux 5.14.21-150500.55.19-default ...
Loading initial ramdisk ...

Synchronous Exception at 0x000000006C217504
================

II. EFI_MEMORY_ATTRIBUTES ("ArmPkg/CpuDxe: Implement EFI memory attributes protocol")
makes shim (!) to fail with Synchronous Exception like that:
================
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)

Synchronous Exception at 0x000000007C318000

Synchronous Exception at 0x000000007C318000
================

Now about *how* I made this mistake during testing.
$ lxc launch ubuntu:jammy jammy-secboot1 --vm -c security.secureboot=true --console
Synchronous Exception
$ lxc stop jammy-secboot1 --force
$ ./replace_firmware.sh
$ lxc start jammy-secboot1 --console
Everything works!
$ lxc stop jammy-secboot1 --force
$ ./revert_firmware.sh
$ lxc start jammy-secboot1 --console
Everything is still working!

The catch here is that Synchronous Exception that happens in shim
happens only on a clean NVRAM! If VM was boot successfully one time,
then it will boot successfuly even after upgrade to a new firmware.
(only about EFI_MEMORY_ATTRIBUTE protocol thing!)

Fixes canonical/lxd#12211

Signed-off-by: Alexander Mikhalitsyn <[email protected]>
@mihalicyn
Copy link
Member

Dear colleagues,

since now (lxd snap revision 25674) Secure Boot/non-Secure Boot VMs should work perfectly on arm64 machines.
It would be great if you check it on your environments and confirm.

There were two problems (and two edk2 commits were reverted) which led to the same observable behavior (Synchronous Exception). Both problems are connected with NX flag but stage of the boot process where failure happens is different. One issue happens in shim, another issue happens during Linux kernel early boot.

https://launchpad.net/~canonical-lxd/+snap/lxd-latest-edge/+build/2235019

cc @IsaacJT @morphis

tomponline pushed a commit to canonical/lxd-pkg-snap that referenced this issue Sep 21, 2023
Unfortunately, we have to disable EFI memory attributes
protocol that was introduced in
tianocore/edk2@1c4dfad
starting from edk2-stable202305 as it leads to crash of SecureBoot shim

There is a fix for shim that addresses this issue, but it will take a few
month until this fix will be landed to different Linux distros and we can't
make our users wait for it.
Fix for shim:
rhboot/shim@c7b3051

canonical/lxd#12211

Signed-off-by: Alexander Mikhalitsyn <[email protected]>
tomponline pushed a commit to canonical/lxd-pkg-snap that referenced this issue Sep 21, 2023
Revert ("ArmVirtPkg: make EFI_LOADER_DATA non-executable") from edk2:
tianocore/edk2@2997ae3
this commit breaks secure boot completely and also affects non-secure boot systems

Old shim, grub2 versions, linux kernel versions are not compatible
with this feature and effectively it breaks almost everything on arm64.

Fixes canonical/lxd#12211

Signed-off-by: Alexander Mikhalitsyn <[email protected]>
tomponline pushed a commit to canonical/lxd-pkg-snap that referenced this issue Sep 21, 2023
Unfortunately, we have to disable EFI memory attributes
protocol that was introduced in
tianocore/edk2@1c4dfad
starting from edk2-stable202305 as it leads to crash of SecureBoot shim

There is a fix for shim that addresses this issue, but it will take a few
month until this fix will be landed to different Linux distros and we can't
make our users wait for it.
Fix for shim:
rhboot/shim@c7b3051

This commit was reverted as a part of ("edk2: disable NX protection feature")
but it was a mistake. Somehow, I made a systematic error in my test of
edk2 with ("edk2: disable EFI memory attributes protocol") reverted and
found that it works. But it's not. And I found *why* I made this mistake.

Just to be clear EFI_MEMORY_ATTRIBUTES protocol and "ArmVirtPkg: make EFI_LOADER_DATA non-executable"
are about setting NX flags on some pages on arm64. And both of commits led
to regressions but on the *different* stages of boot process.

I. "ArmVirtPkg: make EFI_LOADER_DATA non-executable" makes boot process
to fail with Synchronous Exception *after* efi-shim/grub2 finished their work:
================
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
Welcome to GRUB!
  Booting `openSUSE Leap 15.5'

Loading Linux 5.14.21-150500.55.19-default ...
Loading initial ramdisk ...

Synchronous Exception at 0x000000006C217504
================

II. EFI_MEMORY_ATTRIBUTES ("ArmPkg/CpuDxe: Implement EFI memory attributes protocol")
makes shim (!) to fail with Synchronous Exception like that:
================
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)

Synchronous Exception at 0x000000007C318000

Synchronous Exception at 0x000000007C318000
================

Now about *how* I made this mistake during testing.
$ lxc launch ubuntu:jammy jammy-secboot1 --vm -c security.secureboot=true --console
Synchronous Exception
$ lxc stop jammy-secboot1 --force
$ ./replace_firmware.sh
$ lxc start jammy-secboot1 --console
Everything works!
$ lxc stop jammy-secboot1 --force
$ ./revert_firmware.sh
$ lxc start jammy-secboot1 --console
Everything is still working!

The catch here is that Synchronous Exception that happens in shim
happens only on a clean NVRAM! If VM was boot successfully one time,
then it will boot successfuly even after upgrade to a new firmware.
(only about EFI_MEMORY_ATTRIBUTE protocol thing!)

Fixes canonical/lxd#12211

Signed-off-by: Alexander Mikhalitsyn <[email protected]>
@IsaacJT
Copy link

IsaacJT commented Sep 22, 2023

Thanks for your hard work! I can confirm that this is now working with rev 5.18-762f582 (25755)

itrue@odroid:~$ sudo lxc launch ubuntu:jammy --vm --console
Creating the instance
Instance name is: on-bass
Starting on-bass
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
EFI stub: Booting Linux Kernel...
EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary
EFI stub: UEFI Secure Boot is enabled.
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
EFI stub: UEFI Secure Boot is enabled.
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd050]
[    0.000000] Linux version 5.15.0-83-generic (buildd@bos02-arm64-001) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #92-Ubuntu SMP Mon Aug 14 09:34:05 UTC 2023 (Ubuntu 5.15.0-83.92-generic 5.15.116)
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: SMBIOS 3.0=0x7fed0000 MEMATTR=0x7eb5e698 ACPI 2.0=0x7c420018 MOKvar=0x7fdb0000 RNG=0x7c42e718 MEMRESERVE=0x7c318298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment