Bug: kexec on Lenovo ThinkPad T480 disables EFI mode
From: ns
Date: Fri Oct 28 2022 - 09:11:55 EST
Greetings,
I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
cause EFI mode (if that's the right term for it) to be unconditionally
disabled, even when not using the --noefi option to kexec.
What I mean by "EFI mode" being disabled, more than just EFI runtime
services, is that basically nothing about the system's EFI is visible
post-kexec. Normally you have a message like this in dmesg when the
system is booted in EFI mode:
[ 0.000000] efi: EFI v2.70 by EDK II
[ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI
2.0=0x7fb7e014 MEMATTR=0x7ec63018
(obviously not the real firmware of the machine I'm talking about, but I
can also send that if it would be of any help)
No such message pops up in my dmesg as a result of this bug, & this
causes some fallout like being unable to find the system's DMI
information:
<6>[ 0.000000] DMI not present or invalid.
The efivarfs module also fails to load with -ENODEV.
I've tried also booting with efi=runtime explicitly but it doesn't
change anything. The kernel still does not print the name of the EFI
firmware, DMI is still missing, & efivarfs still fails to load.
I've been using the kexec_load syscall for all these tests, if it's
important.
Also, to make it very clear, all this only ever happens post-kexec. When
booting straight from UEFI (with the EFI stub), all the aforementioned
stuff that fails works perfectly fine (i.e. name of firmware is printed,
DMI is properly found, & efivarfs loads & mounts just fine).
This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
bisect it, but it seems like it goes pretty far back. I've got vanilla
mainline kernel builds dating back to 5.17 that have the exact same
issue. It might be worth noting that during this testing, I made sure
the version of the kernel being kexeced & the kernel kexecing were the
same version. It may not have been a problem in older kernels, but that
would be difficult to test for me (a pretty important driver for this
machine was only merged during v5.17-rc4). So it may not have been a
regression & just a hidden problem since time immemorial.
I am willing to test any patches I may get to further debug or fix
this issue, preferably based on the current state of torvalds/linux.git.
I can build & test kernels quite a few times per day.
I can also send any important materials (kernel config, dmesg, firmware
information, so on & so forth) on request. I'll also just mention I'm
using kexec-tools 2.0.24 upfront, if it matters.
Regards,