Re: Bug report: the extended PCI config space is missed with 6.2-rc2

From: Dan Williams
Date: Thu Jan 05 2023 - 16:43:38 EST


Bjorn Helgaas wrote:
> On Thu, Jan 05, 2023 at 01:20:36PM -0800, Dan Williams wrote:
> > Bjorn Helgaas wrote:
> > > On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote:
> > > > Bjorn Helgaas wrote:
> > >
> > > > > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> > > > > firmware/kernel interface is as an EfiMemoryMappedIO region.
> > > > >
> > > > > I think this is a firmware bug, but obviously we're going to have to
> > > > > figure out a way around it.
> > > >
> > > > Definitely an ambiguity / conflict, but not sure it is a bug when you
> > > > look at from the perspective of how would an EFI runtime service use
> > > > ECAM/MMCONFIG space?
> > >
> > > I think it's perfectly fine for firmware to advertise ECAM space as an
> > > EfiMemoryMappedIO region via EFI GetMemoryMap() because it certainly
> > > makes sense that EFI runtime services would use config space.
> > >
> > > My understanding is that the OS should learn about device address
> > > space via ACPI _CRS, not GetMemoryMap(). The MCFG spec (PCI Firmware
> > > Spec, r3.3, sec 4.1.2) requires ECAM space to be reserved via a
> > > PNP0C02 motherboard device _CRS.
> > >
> > > So what I think *is* a bug is that this firmware doesn't report the
> > > ECAM space via PNP0C02 _CRS.
> > >
> > > If somebody thinks the lack of this reservation is not a bug, I would
> > > love to hear ideas about how Linux *should* be handling this. There
> > > are many variations on how firmware does things like this, and it's
> > > been a nightmare trying to figure out something that works with all of
> > > them.
> >
> > I am trying to get a statement from a BIOS person, but in the meantime I
> > am confused by this lead in sentence of Note 2 in "PCI Firmware Spec
> > v3.2 Table 4-2: MCFG Table to Support Enhanced Configuration Space
> > Access":
> >
> > If the operating system does not natively comprehend reserving the MMCFG
> > region, the MMCFG region must be reserved by firmware. The address range
> > reported in the MCFG table or by _CBA method (see Section 4.1.3) must be
> > reserved by declaring a motherboard resource...
> >
> > Which seems to say it is ok for the OS to treat MMCFG space as reserved
> > by default. It certainly fails the Robustness Principle for the BIOS to
> > *assume* that the OS can natively comprehend that reservation, but it
> > seems Linux is in its rights to make that assumption.
>
> I read "OS natively comprehends MMCFG space" as meaning "the OS has
> device-specific knowledge of the PCI host bridge and the associated
> MMCFG space." But in that case, the OS wouldn't need MCFG at all, so
> maybe I'm not reading it right.
>
> There must have been some reason for that sentence, e.g., some system
> that didn't or couldn't report MMCFG via PNP0C02 _CBA, but it sure
> makes a mess of what could have been a simple "range must be reserved"
> statement.
>
> > > > Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> > > > Memory Type Usage after ExitBootServices()"?
> > > >
> > > > s/This memory is not used by the OS./This memory is not used by the OS,
> > > > unless ACPI declares it for another purpose./
> > >
> > > I guess the idea is that MCFG is a form of "ACPI declaring it"? I
> > > don't have an explicit citation for it, but I infer at [1] that ACPI
> > > static tables are second-class citizens and not intended as a way of
> > > reserving address space because that would lead to problems booting
> > > old OSes on firmware that provides new tables unknown to the OS.
> >
> > Ah, true, certainly for new stuff, but what about MCFG specifically?
> > What harm is there an assuming that MMCONFIG intersecting with
> > EfiMemoryMappedIO shall be treated as reserved for MMCONFIG usage.
>
> Probably none, and I think that's what we'll have to do. Ugh.
> Another random special-case rule.
>
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32

I am still holding out that a BIOS developer can either say "whoops,
populating MMCONFIG in _CRS was overlooked", or point out "if you take
the derivative of the PCI spec, multiply it be the inverse of the EFI
spec and then take the cross-product with the ACPI spec then the memory
type comes out as implicitly reserved".