Re: [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries

From: Dave Young
Date: Tue Dec 31 2019 - 23:52:23 EST


Hi Dan,
On 12/31/19 at 02:04pm, Dan Williams wrote:
> Dave noticed that when specifying multiple efi_fake_mem= entries only
> the last entry was successfully being reflected in the efi memory map.
> This is due to the fact that the efi_memmap_insert() is being called
> multiple times, but on successive invocations the insertion should be
> applied to the last new memmap rather than the original map at
> efi_fake_memmap() entry.
>
> Rework efi_fake_memmap() to install the new memory map after each
> efi_fake_mem= entry is parsed.
>
> This also fixes an issue in efi_fake_memmap() that caused it to litter
> emtpy entries into the end of the efi memory map. The empty entry causes
> efi_memmap_insert() to attempt more memmap splits / copies than
> efi_memmap_split_count() accounted for when sizing the new map.
>
> BUG: unable to handle page fault for address: ffffffffff281000
> [..]
> RIP: 0010:efi_memmap_insert+0x11d/0x191
> [..]
> Call Trace:
> ? bgrt_init+0xbe/0xbe
> ? efi_arch_mem_reserve+0x1cb/0x228
> ? acpi_parse_bgrt+0xa/0xd
> ? acpi_table_parse+0x86/0xb8
> ? acpi_boot_init+0x494/0x4e3
> ? acpi_parse_x2apic+0x87/0x87
> ? setup_acpi_sci+0xa2/0xa2
> ? setup_arch+0x8db/0x9e1
> ? start_kernel+0x6a/0x547
> ? secondary_startup_64+0xb6/0xc0
>
> Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> services data to fix kexec breakage" is listed in Fixes: since it
> introduces more occurrences where efi_memmap_insert() is invoked after
> an efi_fake_mem= configuration has been parsed. Previously the side
> effects of vestigial empty entries were benign, but with commit
> af1648984828 that follow-on efi_memmap_insert() invocation triggers the
> above crash signature.
>
> Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> Link: https://lore.kernel.org/r/20191231014630.GA24942@xxxxxxxxxxxxxxxxxxxxxxxxxx
> Reported-by: Dave Young <dyoung@xxxxxxxxxx>
> Cc: Taku Izumi <izumi.taku@xxxxxxxxxxxxxx>
> Cc: Michael Weiser <michael@xxxxxxxxxxxxxxxxxxx>
> Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> ---
> drivers/firmware/efi/fake_mem.c | 32 +++++++++++++++++---------------
> drivers/firmware/efi/memmap.c | 2 +-
> include/linux/efi.h | 2 ++
> 3 files changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index 7e53e5520548..68d752d8af21 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -34,26 +34,17 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
> return 0;
> }
>
> -void __init efi_fake_memmap(void)
> +static void __init efi_fake_range(struct efi_mem_range *efi_range)
> {
> int new_nr_map = efi.memmap.nr_map;
> efi_memory_desc_t *md;
> phys_addr_t new_memmap_phy;
> unsigned long flags = 0;
> void *new_memmap;
> - int i;
> -
> - if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> - return;
>
> /* count up the number of EFI memory descriptor */
> - for (i = 0; i < nr_fake_mem; i++) {
> - for_each_efi_memory_desc(md) {
> - struct range *r = &efi_fake_mems[i].range;
> -
> - new_nr_map += efi_memmap_split_count(md, r);
> - }
> - }
> + for_each_efi_memory_desc(md)
> + new_nr_map += efi_memmap_split_count(md, &efi_range->range);

I have another concern here :(

THe efi_memmap_split_count mean to only split for a specific md, and you
can see arch/x86/platform/efi/quirks.c about the use:
if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
return;
}

Any memory region to be inserted but spans different md will be
rejected. So the memmap insert logic seems does not support the
spanned ranges. I did not find a case two contiguous same type ranges
eg. two "Conventional memory", if have they should have been merged.

So maybe just use same way as the quirks.c here to find the valid md first
then get the split count?

Otherwise I tested the series bootup test passed.

BTW, another issue about fakemem, currently it only works with normal
physical boot, in case of kexec reboot the kernel only aware of EFI
runtime memory ranges, we do not pass other types in memmap. But maybe
we can live with it considering fake mem is only for debugging purpose.

Thanks
Dave