Re: x86_64 32-bit EFI mixed mode boot broken

From: Ard Biesheuvel
Date: Fri Mar 22 2024 - 08:52:29 EST


On Fri, 22 Mar 2024 at 01:06, Clayton Craft <clayton@xxxxxxxxxxxxx> wrote:
>
> On Thu, 21 Mar 2024 23:48:09 +0100 Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > > v6.8 fails for me, and presumably so does everything back to v6.2. v6.1 is able
> > > to boot OK on these platforms with mixed mode, and it looks like there are a lot
> > > of changes from 6.1..6.2 for EFI/mixed mode booting.
> >
> > v6.1 just received some EFI related backports, so please check the
> > latest v6.1.y as well.
>
> I just gave v6.1.82 a try, and it fails to boot for me. That seems to be a
> regression from the 6.1.0 that I tested previously.
>
> > I usually test on 32-bit OVMF built with LOAD_X64_ON_IA32_ENABLE,
> > which allows the use of the compat entry point. This is different from
> > the EFI handover protocol, and I am not sure which one you are using.
>
> I should have mentioned this previously, here's the EFI-related kconfig that I
> am using. If there's anything missing then please let me know:
>
> CONFIG_EFI=y
> CONFIG_EFI_EARLYCON=y
> CONFIG_EFI_ESRT=y
> # CONFIG_EFI_HANDOVER_PROTOCOL is not set
> CONFIG_EFI_MIXED=y
> CONFIG_EFI_RUNTIME_WRAPPERS=y
> CONFIG_EFI_STUB=y
> CONFIG_EFI_VARS_PSTORE=m
> CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
>
> Note that the EFI handover protocol support is disabled, I was under the
> impression that it's not required for mixed mode.
>

That depends on the bootloader. One of the changes around that time is
the introduction of this Kconfig symbol: before that, the EFI handover
protocol was always supported but now it can be compiled out. So the
safe choice is to enable it.

However, while looking more deeply into this, I noticed that we are
running quite low own stack space. Mixed mode is different because it
calls into the boot services using the decompressor's boot stack,
rather than using the one that was provided by firmware at entry.
(Note that the UEFI spec mandates 128k of stack space)

In my case, I bisected the regression to

commit 5c4feadb0011983bbc4587bc61056c7b379d9969 (HEAD)
Author: Ard Biesheuvel <ardb@xxxxxxxxxx>
Date: Mon Aug 7 18:27:16 2023 +0200

x86/decompressor: Move global symbol references to C code

which moves the boot stack into a different memory region. Formerly,
we'd end up at the far end of the heap when overrunning the stack but
now, we end up crashing. Of course, overwriting the heap can cause
problems of its own, so we'll need to bump this in any case.

Could you give this a try please?


--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -38,7 +38,7 @@
#endif

#ifdef CONFIG_X86_64
-# define BOOT_STACK_SIZE 0x4000
+# define BOOT_STACK_SIZE 0x10000

/*
* Used by decompressor's startup_32() to allocate page tables for identity