[+cc Len, Robert in case I'm missing something about static tables
reserving address space]
On Tue, Dec 05, 2023 at 12:28:44PM -0600, Mario Limonciello wrote:
On 12/5/2023 11:31, Bjorn Helgaas wrote:
On Tue, Dec 05, 2023 at 11:00:31AM -0600, Mario Limonciello wrote:
On 12/5/2023 10:17, Bjorn Helgaas wrote:
On Tue, Dec 05, 2023 at 09:48:45AM -0600, Mario Limonciello wrote:
commit 7752d5cfe3d1 ("x86: validate against acpi motherboard
resources") introduced checks for ensuring that MCFG table
also has memory region reservations to ensure no conflicts
were introduced from a buggy BIOS.
This has proceeded over time to add other types of
reservation checks for ACPI PNP resources and EFI MMIO
memory type. The PCI firmware spec however says that these
checks are only required when the operating system doesn't
comprehend the firmware region:
``` If the operating system does not natively comprehend
reserving the MMCFG region, the MMCFG region must be
reserved by firmware. The address range reported in the MCFG
table or by _CBA method (see Section 4.1.3) must be reserved
by declaring a motherboard resource. For most systems, the
motherboard resource would appear at the root of the ACPI
namespace (under \_SB) in a node with a _HID of EISAID
(PNP0C02), and the resources in this case should not be
claimed in the root PCI bus’s _CRS. The resources can
optionally be returned in Int15 E820h or EFIGetMemoryMap as
reserved memory but must always be reported through ACPI as
a motherboard resource. ```
My understanding is that native comprehension would mean Linux
knows how to discover and/or configure the MMCFG base address
and size in the hardware and that Linux would then reserve
that region so it's not used for anything else.
Linux doesn't have that, at least for x86. It relies on the
MCFG table to discover the MMCFG region, and it relies on
PNP0C02 _CRS to reserve it.
MCFG to discover it matches the PCI firmware spec, but as I
point out above the decision to reserve this region doesn't
require PNP0C01/PNP0C02 _CRS.
Can you explain this reasoning a little more? I claim Linux does
not natively comprehend reserving the MMCFG region, but it sounds
like you don't agree? I think "native" comprehension would mean
Linux would not need the MCFG table.
After our thread and the spec again I think you're right Linux
doesn't natively comprehend (reserve this region;) particularly
because of the stance you have on "static table" vs _CRS.
["My stance" refers to this:
Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
reserving address space. The static tables are for things the OS
needs to know early in boot, before it can parse the ACPI namespace.
If a new table is defined, an old OS needs to operate correctly even
though it ignores the table. _CRS allows that because it is generic
and understood by the old OS; a static table does not.
from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.6#n32]
I don't think this is just my stance. The ACPI spec could be clearer
in terms of requiring PNP0C02 devices, not static tables, to reserve
address space, but I think that requirement is a logical consequence
of the ACPI design.
It's a goal of ACPI that an OS we release today should run on a
platform released tomorrow. If the new platform uses a static table
to reserve address space used by some new hardware, today's OS doesn't
know about it and could place another device on top of it.
Using _CRS of an ACPI device to reserve the new hardware address space
is different because it works even with today's OS. Today's OS can't
*operate* tomorrow's hardware, but at least it won't create address
conflicts with it.
I just don't want to throw the vendor under the bus as it could have
been caught "sooner" and fixed by BIOS adding _CRS.
The MCFG requirement for PNP0C02 _CRS has been in the PCI Firmware
spec since r3.0 in 2005. I'm surprised that vendors still get this
wrong.
Vendors definitely have an interest in making shipping OSes
boot unchanged on new hardware.
Knowing Windows works without it I feel this is still something that we
should be looking at fixing from an upstream perspective though which is
what prompted my patch and discussion.
The fact that Windows works doesn't mean the firmware is correct.
Linux assigns PCI BARs from the bottom up, and ECAM is often at the
bottom of a host bridge aperture.
Windows assigns PCI BARs from the top down, so even without a _CRS
reservation for the ECAM space, Windows is much less likely to put
something on top of it.
Bjorn