Re: Bug report: the extended PCI config space is missed with 6.2-rc2

From: Bjorn Helgaas
Date: Thu Jan 05 2023 - 16:35:51 EST


On Thu, Jan 05, 2023 at 01:20:36PM -0800, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote:
> > > Bjorn Helgaas wrote:
> >
> > > > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> > > > firmware/kernel interface is as an EfiMemoryMappedIO region.
> > > >
> > > > I think this is a firmware bug, but obviously we're going to have to
> > > > figure out a way around it.
> > >
> > > Definitely an ambiguity / conflict, but not sure it is a bug when you
> > > look at from the perspective of how would an EFI runtime service use
> > > ECAM/MMCONFIG space?
> >
> > I think it's perfectly fine for firmware to advertise ECAM space as an
> > EfiMemoryMappedIO region via EFI GetMemoryMap() because it certainly
> > makes sense that EFI runtime services would use config space.
> >
> > My understanding is that the OS should learn about device address
> > space via ACPI _CRS, not GetMemoryMap(). The MCFG spec (PCI Firmware
> > Spec, r3.3, sec 4.1.2) requires ECAM space to be reserved via a
> > PNP0C02 motherboard device _CRS.
> >
> > So what I think *is* a bug is that this firmware doesn't report the
> > ECAM space via PNP0C02 _CRS.
> >
> > If somebody thinks the lack of this reservation is not a bug, I would
> > love to hear ideas about how Linux *should* be handling this. There
> > are many variations on how firmware does things like this, and it's
> > been a nightmare trying to figure out something that works with all of
> > them.
>
> I am trying to get a statement from a BIOS person, but in the meantime I
> am confused by this lead in sentence of Note 2 in "PCI Firmware Spec
> v3.2 Table 4-2: MCFG Table to Support Enhanced Configuration Space
> Access":
>
> If the operating system does not natively comprehend reserving the MMCFG
> region, the MMCFG region must be reserved by firmware. The address range
> reported in the MCFG table or by _CBA method (see Section 4.1.3) must be
> reserved by declaring a motherboard resource...
>
> Which seems to say it is ok for the OS to treat MMCFG space as reserved
> by default. It certainly fails the Robustness Principle for the BIOS to
> *assume* that the OS can natively comprehend that reservation, but it
> seems Linux is in its rights to make that assumption.

I read "OS natively comprehends MMCFG space" as meaning "the OS has
device-specific knowledge of the PCI host bridge and the associated
MMCFG space." But in that case, the OS wouldn't need MCFG at all, so
maybe I'm not reading it right.

There must have been some reason for that sentence, e.g., some system
that didn't or couldn't report MMCFG via PNP0C02 _CBA, but it sure
makes a mess of what could have been a simple "range must be reserved"
statement.

> > > Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> > > Memory Type Usage after ExitBootServices()"?
> > >
> > > s/This memory is not used by the OS./This memory is not used by the OS,
> > > unless ACPI declares it for another purpose./
> >
> > I guess the idea is that MCFG is a form of "ACPI declaring it"? I
> > don't have an explicit citation for it, but I infer at [1] that ACPI
> > static tables are second-class citizens and not intended as a way of
> > reserving address space because that would lead to problems booting
> > old OSes on firmware that provides new tables unknown to the OS.
>
> Ah, true, certainly for new stuff, but what about MCFG specifically?
> What harm is there an assuming that MMCONFIG intersecting with
> EfiMemoryMappedIO shall be treated as reserved for MMCONFIG usage.

Probably none, and I think that's what we'll have to do. Ugh.
Another random special-case rule.

> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32
>
>