Re: Bug report: the extended PCI config space is missed with 6.2-rc2
From: Giovanni Cabiddu
Date: Mon Jan 09 2023 - 07:27:28 EST
[+cc Herbert, linux-crypto ML as it affects QAT]
On Thu, Jan 05, 2023 at 04:32:57PM -0600, Bjorn Helgaas wrote:
> [+cc Tony, Dan]
>
> On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> > Hi Bjorn,
> >
> > Happy new year!
> >
> > We found some PCI issues with the latest 6.2-rc2.
> >
> > - Using the lspci -xxxx, the extended PCI config space of all PCI
> > devices are missed with the latest 6.2-rc2. The system we used had 932
> > PCI devices, at least 800 which have extended space as seen when booted
> > into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> > - The drivers which rely on the information in the extended PCI config
> > space don't work anymore. We have confirmed that the perf uncore driver
> > (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> > work in 6.2-rc2. There could be more drivers which are impacted.
> >
> > After a bisect, we found the regression is caused by the below commit
> > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> > After reverting the commit, the issues are gone.
This patch also affects all the QAT drivers, and causes them to fail
during the probe when they look at data from the extended PCI config
space.
Herbert, FYI, this patch is in your cryptodev-2.6 tree.
I tried the patch below and it seems to resolve the problem on my
system (S2600WFQ with Skylake).
> Can you try this patch (based on v6.2-rc1):
>
>
> commit 89a0067217b0 ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space")
> parent 1b929c02afd3
> Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Date: Thu Jan 5 16:02:58 2023 -0600
>
> x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
>
> Normally we reject ECAM space unless it is reported as reserved in the E820
> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> means extended config space (offsets 0x100-0xfff) may not be accessible.
>
> Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> normally converted to an E820 entry by a bootloader or EFI stub.
>
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> E820 entries that correspond to EfiMemoryMappedIO regions because some
> other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> E820 entries prevent Linux from allocating BAR space for hot-added devices.
>
> Allow use of ECAM for extended config space when the region is covered by
> an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> _CRS.
>
> Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@xxxxxxxxxxxxxxx
>
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 758cbfe55daa..4adc587a4c94 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -12,6 +12,7 @@
> */
>
> #include <linux/acpi.h>
> +#include <linux/efi.h>
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <linux/bitmap.h>
> @@ -442,6 +443,25 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
> return mcfg_res.flags;
> }
>
> +static bool is_efi_reserved(u64 start, u64 end, enum e820_type not_used)
> +{
> + efi_memory_desc_t *md;
> + u64 size, mmio_start, mmio_end;
> +
> + for_each_efi_memory_desc(md) {
> + if (md->type == EFI_MEMORY_MAPPED_IO) {
> + size = md->num_pages << EFI_PAGE_SHIFT;
> + mmio_start = md->phys_addr;
> + mmio_end = mmio_start + size - 1;
> +
> + if (mmio_start <= start && end <= mmio_end)
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
>
> static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> @@ -452,7 +472,7 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> u64 size = resource_size(&cfg->res);
> u64 old_size = size;
> int num_buses;
> - char *method = with_e820 ? "E820" : "ACPI motherboard resources";
> + char *method = with_e820 ? "E820" : "ACPI motherboard resources or EFI";
>
> while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
> size >>= 1;
> @@ -502,15 +522,17 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e
> if (!early && !acpi_disabled) {
> if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
> return true;
> + if (is_mmconf_reserved(is_efi_reserved, cfg, dev, 0))
> + return true;
>
> if (dev)
> dev_info(dev, FW_INFO
> - "MMCONFIG at %pR not reserved in "
> + "MMCONFIG at %pR not reserved in EFI "
> "ACPI motherboard resources\n",
> &cfg->res);
> else
> pr_info(FW_INFO PREFIX
> - "MMCONFIG at %pR not reserved in "
> + "MMCONFIG at %pR not reserved in EFI or "
> "ACPI motherboard resources\n",
> &cfg->res);
> }
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@xxxxxxxxx>
Regards,
--
Giovanni