Re: [PATCH v2 1/4] efi/x86: Remove EfiMemoryMappedIO from E820 map

From: Hans de Goede
Date: Fri Dec 09 2022 - 03:07:55 EST


Hi,

One comment (logging bug in patch) below:

On 12/8/22 20:03, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
>
> Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
> by the OS so they can be accessed by EFI runtime services, but should have
> no other significance to the OS (UEFI r2.10, sec 7.2). However, most
> bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
> E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
> them (see remove_e820_regions()).
>
> Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
> host bridge windows, which means Linux can't allocate BAR space for
> hot-added devices.
>
> Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
> problem.
>
> Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some
> platforms, these describe non-window space that's included in host bridge
> _CRS. If we assign that space to PCI devices, they don't work. On the
> Lenovo X1 Carbon, this leads to suspend/resume failures.
>
> The previous solution to the problem of allocating BARs in these regions
> was to add pci_crs_quirks[] entries to disable E820 checking for these
> machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping
> via quirks")):
>
> Acer DMI_PRODUCT_NAME Spin SP513-54N
> Clevo DMI_BOARD_NAME X170KM-G
> Lenovo DMI_PRODUCT_VERSION *IIL*
>
> Florent reported the BAR allocation issue on the Clevo NL4XLU. We could
> add another quirk for the NL4XLU, but I hope this generic change can solve
> it for many machines without having to add quirks.
>
> This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad
> Slim 3 (Matt) and solves the problem even when overriding the existing
> quirks by booting with "pci=use_e820".
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565 Clevo NL4XLU
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Ideapad Slim 3
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 X1 Carbon
> Reported-by: Florent DELAHAYE <kernelorg@xxxxxxxxx>
> Tested-by: Konrad J Hambrick <kjhambrick@xxxxxxxxx>
> Tested-by: Matt Hansen <2lprbe78@xxxxxxxx>
> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Cc: Hans de Goede <hdegoede@xxxxxxxxxx>
> ---
> arch/x86/platform/efi/efi.c | 46 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index ebc98a68c400..dee1852e95cd 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -303,6 +303,50 @@ static void __init efi_clean_memmap(void)
> }
> }
>
> +/*
> + * Firmware can use EfiMemoryMappedIO to request that MMIO regions be
> + * mapped by the OS so they can be accessed by EFI runtime services, but
> + * should have no other significance to the OS (UEFI r2.10, sec 7.2).
> + * However, most bootloaders and EFI stubs convert EfiMemoryMappedIO
> + * regions to E820_TYPE_RESERVED entries, which prevent Linux from
> + * allocating space from them (see remove_e820_regions()).
> + *
> + * Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and
> + * PCI host bridge windows, which means Linux can't allocate BAR space for
> + * hot-added devices.
> + *
> + * Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
> + * problem.
> + *
> + * Retain small EfiMemoryMappedIO regions because on some platforms, these
> + * describe non-window space that's included in host bridge _CRS. If we
> + * assign that space to PCI devices, they don't work.
> + */
> +static void __init efi_remove_e820_mmio(void)
> +{
> + efi_memory_desc_t *md;
> + u64 size, start, end;
> + int i = 0;
> +
> + for_each_efi_memory_desc(md) {
> + if (md->type == EFI_MEMORY_MAPPED_IO) {
> + size = md->num_pages << EFI_PAGE_SHIFT;
> + if (size >= 256*1024) {
> + start = md->phys_addr;
> + end = start + size - 1;
> + pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n",
> + i, start, end, size >> 20);
> + e820__range_remove(start, size,
> + E820_TYPE_RESERVED, 1);
> + } else {
> + pr_info("Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) from e820 map\n",
> + i, start, end, size >> 10);

The logging in this else is re-using the start and end from the previous section which was actually removed.

E.g. Matt's latest log from:
https://bugzilla.redhat.com/show_bug.cgi?id=1868899
has:

[ 0.000000] e820: remove [mem 0xfc800000-0xfe7fffff] reserved
[ 0.000000] efi: Not removing mem46: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map
[ 0.000000] efi: Not removing mem47: MMIO range=[0xfc800000-0xfe7fffff] (32KB) from e820 map
[ 0.000000] efi: Not removing mem49: MMIO range=[0xfc800000-0xfe7fffff] (8KB) from e820 map
[ 0.000000] efi: Not removing mem50: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map

Notice how all the "Not removing ..." lines log the same range as
the actually removed map entry above them.

Regards,

Hans








> + }
> + }
> + i++;
> + }
> +}
> +
> void __init efi_print_memmap(void)
> {
> efi_memory_desc_t *md;
> @@ -474,6 +518,8 @@ void __init efi_init(void)
> set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> efi_clean_memmap();
>
> + efi_remove_e820_mmio();
> +
> if (efi_enabled(EFI_DBG))
> efi_print_memmap();
> }