Re: [PATCH v7 1/3] efi/x86: Fix EFI memory map corruption with kexec

From: Kalra, Ashish
Date: Mon Jun 03 2024 - 12:56:24 EST


On 6/3/2024 10:29 AM, Mike Rapoport wrote:

On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote:
On 6/3/2024 8:39 AM, Mike Rapoport wrote:

On Mon, Jun 03, 2024 at 08:06:56AM -0500, Kalra, Ashish wrote:
On 6/3/2024 3:56 AM, Borislav Petkov wrote

EFI memory map and due to early allocation it uses memblock allocation.

Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which remaps the efi_memmap physically allocated
in efi_arch_mem_reserve(), but this remapping is still using memblock allocation.

Subsequently, when memblock is freed later in boot flow, this remapped
efi_memmap will have random corruption (similar to a use-after-free scenario).

The corrupted EFI memory map is then passed to the next kexec-ed kernel
which causes a panic when trying to use the corrupted EFI memory map.
This sounds fishy: memblock allocated memory is not freed later in the
boot - it remains reserved. Only free memory is freed from memblock to
the buddy allocator.

Or is the problem that memblock-allocated memory cannot be memremapped
because *raisins*?
This is what seems to be happening:

efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for
EFI memory map and due to early allocation it uses memblock allocation.

And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode()
in case of a kexec-ed kernel boot.

This function kexec_enter_virtual_mode() installs the new EFI memory map by
calling efi_memmap_init_late() which does memremap() on memblock-allocated memory.
Does the issue happen only with SNP?
This is observed under SNP as efi_arch_mem_reserve() is only being called
with SNP enabled and then efi_arch_mem_reserve() allocates EFI memory map
using memblock.
I don't see how efi_arch_mem_reserve() is only called with SNP. What did I
miss?

This is the call stack for efi_arch_mem_reserve():

[ 0.310010] efi_arch_mem_reserve+0xb1/0x220
[ 0.311382] efi_mem_reserve+0x36/0x60
[ 0.311973] efi_bgrt_init+0x17d/0x1a0
[ 0.313265] acpi_parse_bgrt+0x12/0x20
[ 0.313858] acpi_table_parse+0x77/0xd0
[ 0.314463] acpi_boot_init+0x362/0x630
[ 0.315069] setup_arch+0xa88/0xf80
[ 0.315629] start_kernel+0x68/0xa90
[ 0.316194] x86_64_start_reservations+0x1c/0x30
[ 0.316921] x86_64_start_kernel+0xbf/0x110
[ 0.317582] common_startup_64+0x13e/0x141

So, probably it is being invoked specifically for AMD platform ?

If we skip efi_arch_mem_reserve() (which should probably be anyway skipped
for kexec case), then for kexec boot, EFI memmap is memremapped in the same
virtual address as the first kernel and not the allocated memblock address.
Maybe we should skip efi_arch_mem_reserve() for kexec case, but I think we
still need to understand what's causing memory corruption.

When, efi_arch_mem_reserve() allocates memory for EFI memory map using memblock and then later in boot, kexec_enter_virtual_mode() does memremap on this memblock allocated memory, subsequently after this i see EFI memory map corruption, so are there are any issues doing memremap on memblock-allocated memory ?

Thanks, Ashish

I didn't really dig, but my theory would be that it has something to do
with arch_memremap_can_ram_remap() in arch/x86/mm/ioremap.c
Thanks, Ashish