Re: Bug report: the extended PCI config space is missed with 6.2-rc2
From: Bjorn Helgaas
Date: Wed Jan 04 2023 - 10:46:34 EST
On Wed, Jan 04, 2023 at 08:50:32AM -0600, Bjorn Helgaas wrote:
> On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> > Hi Bjorn,
> >
> > Happy new year!
> >
> > We found some PCI issues with the latest 6.2-rc2.
> >
> > - Using the lspci -xxxx, the extended PCI config space of all PCI
> > devices are missed with the latest 6.2-rc2. The system we used had 932
> > PCI devices, at least 800 which have extended space as seen when booted
> > into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> > - The drivers which rely on the information in the extended PCI config
> > space don't work anymore. We have confirmed that the perf uncore driver
> > (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> > work in 6.2-rc2. There could be more drivers which are impacted.
> >
> > After a bisect, we found the regression is caused by the below commit
> > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> > After reverting the commit, the issues are gone.
> >
> > Could you please take a look at the issues?
>
> Certainly. Can you capture the complete dmesg log, please?
Thanks! Comparing v5.19 and v6.2-rc2, I see these:
--- v5.19
+++ v6.2-rc2
+efi: Remove mem458: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
+e820: remove [mem 0x80000000-0x8fffffff] reserved
-PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
+PCI: not using MMCONFIG
+PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
+[Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
+PCI: not using MMCONFIG
system 00:01: [mem 0xff000000-0xffffffff] has been reserved
system 00:02: [mem 0xfd000000-0xfd69ffff] could not be reserved
system 00:02: [mem 0xfd6c0000-0xfd6cffff] has been reserved
system 00:02: [mem 0xfd6f0000-0xfdffffff] has been reserved
system 00:02: [mem 0xfe000000-0xfe01ffff] could not be reserved
system 00:02: [mem 0xfe200000-0xfe7fffff] has been reserved
system 00:02: [mem 0xff000000-0xffffffff] has been reserved
I think this is a firmware defect. MCFG says the ECAM space is at
[mem 0x80000000-0x8fffffff]. Per the PCI Firmware Spec, r3.3, Note 2
of Table 4-2, this space should be reserved by a motherboard resource,
i.e., a PNP0C02 device (which would appear as "system 00:01" or
similar above) with _CRS that includes [mem 0x80000000-0x8fffffff].
This firmware supplies an EfiMemoryMappedIO region
[0x80000000-0x8fffffff] for the ECAM space (this could be confirmed by
adding "efi=debug"), and the bootloader or EFI stub converted that to
an E820 entry that Linux consumes.
On v5.19, Linux treated that EfiMemoryMappedIO region as a reservation
of the ECAM space, but starting with v6.2-rc1, Linux removes
EfiMemoryMappedIO regions from E820.
My understanding is that EfiMemoryMappedIO tells the OS to map the
area for use by runtime services, but is not intended to prevent the
OS from using the area. Some platforms use EfiMemoryMappedIO for PCI
host bridge apertures, and of course the OS needs to use those.
If your firmware folks disagree and think Linux should be able to
figure this out differently, I would love to have a conversation about
how to do this.
Bjorn