Re: [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec.

From: Kalra, Ashish
Date: Sun Mar 24 2024 - 18:33:04 EST


Hello,

On 3/18/2024 11:00 PM, Dave Young wrote:
Hi,

Added Ard in cc.

On 03/18/24 at 07:02am, Ashish Kalra wrote:
From: Ashish Kalra <ashish.kalra@xxxxxxx>

For kexec use case, need to use and stick to the EFI memmap passed
from the first kernel via boot-params/setup data, hence,
skip efi_arch_mem_reserve() during kexec.

Additionally during SNP guest kexec testing discovered that EFI memmap
is corrupted during chained kexec. kexec_enter_virtual_mode() during
late init will remap the efi_memmap physical pages allocated in
efi_arch_mem_reserve() via memboot & then subsequently cause random
EFI memmap corruption once memblock is freed/teared-down.

Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx>
---
arch/x86/platform/efi/quirks.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..d4562d074371 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -258,6 +258,16 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
int num_entries;
void *new;
+ /*
+ * For kexec use case, we need to use the EFI memmap passed from the first
+ * kernel via setup data, so we need to skip this.
+ * Additionally kexec_enter_virtual_mode() during late init will remap
+ * the efi_memmap physical pages allocated here via memboot & then
+ * subsequently cause random EFI memmap corruption once memblock is freed.
Can you elaborate a bit about the corruption, is it reproducible without
SNP?

This is only reproducible on SNP.

This is the call-stack for the above function:

[    0.313377]  efi_arch_mem_reserve+0x64/0x220^M
[    0.314060]  ? memblock_add_range+0x2a0/0x2e0^M
[    0.314763]  efi_mem_reserve+0x36/0x60^M
[    0.315360]  efi_bgrt_init+0x17d/0x1a0^M
[    0.315959]  ? __pfx_acpi_parse_bgrt+0x10/0x10^M
[    0.316711]  acpi_parse_bgrt+0x12/0x20^M
[    0.317310]  acpi_table_parse+0x77/0xd0^M
[    0.317922]  acpi_boot_init+0x362/0x630^M
[    0.318535]  setup_arch+0xa4e/0xf90^M
[    0.319091]  start_kernel+0x68/0xa70^M
[    0.319664]  x86_64_start_reservations+0x1c/0x30^M
[    0.320431]  x86_64_start_kernel+0xbf/0x110^M
[    0.321099]  secondary_startup_64_no_verify+0x179/0x17b^M

This function efi_arch_mem_reserve() calls efi_memmap_alloc() which in turn calls __efi_memmap_alloc_early()  which does memblock_phys_alloc(), and later does efi_memmap_install() which does early_memremap() of the EFI memmap into this memblock allocated physical memory. So now EFI memmap gets re-mapped into the memblock allocated memory.

Later kexec_enter_virtual_mode() calls efi_memmap_init_late() which memremap()'s the EFI memmap into the above memblock allocated physical range.

Obviously, when memblocks are later freed during late init, this memblock allocated physical range will get freed and re-allocated which will eventually overwrite and corrupt the EFI memmap leading to subsequent kexec boot crash.

+ */
+ if (efi_setup)
+ return;
+
How about checking the md attribute instead of checking the efi_setup,
personally I feel it a bit better, something like below:

I based the above on the following code checking for kexec boot:

void __init efi_enter_virtual_mode(void)
{
       ...

        if (efi_setup)
                kexec_enter_virtual_mode();
        else
                __efi_enter_virtual_mode();

But, i have tested with the code (you shared below) about checking the md attribute and it works, so i can resend my v2 patch based on this.

Thanks, Ashish


diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..699332b075bb 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -255,15 +255,24 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
struct efi_memory_map_data data = { 0 };
struct efi_mem_range mr;
efi_memory_desc_t md;
- int num_entries;
+ int num_entries, ret;
void *new;
- if (efi_mem_desc_lookup(addr, &md) ||
- md.type != EFI_BOOT_SERVICES_DATA) {
+ ret = efi_mem_desc_lookup(addr, &md);
+ if (ret) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
return;
}
+ if (md.type != EFI_BOOT_SERVICES_DATA) {
+ pr_err("Skil reserving non EFI Boot Service Data memory for %pa\n", &addr);
+ return;
+ }
+
+ /* Kexec copied the efi memmap from the 1st kernel, thus skip the case. */
+ if (md.attribute & EFI_MEMORY_RUNTIME)
+ return;
+
if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
return;