Re: [RFC PATCH 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6

From: Dan Williams
Date: Wed Mar 05 2025 - 15:39:02 EST


Bjorn Helgaas wrote:
> On Tue, Mar 04, 2025 at 03:51:08PM +0200, Ilpo Järvinen wrote:
> > Disallow Extended Tags and Max Read Request Size (MRRS) larger than
> > 128B for devices under Xeon 6 Root Ports if the Root Port is bifurcated
> > to x2. Also, 10-Bit Tag Requester should be disallowed for device
> > underneath these Root Ports but there is currently no 10-Bit Tag
> > support in the kernel.
> >
> > The normal path that writes MRRS is through
> > pcie_bus_configure_settings() -> pcie_bus_configure_set() ->
> > pcie_write_mrrs() and contains a few early returns that are based on
> > the value of pcie_bus_config. Overriding such checks with the host
> > bridge flag check on each level seems messy. Thus, simply ensure MRRS
> > is always written in pci_configure_device() if a device requiring the
> > quirk is detected.
>
> This is kind of weird. It's apparently not an erratum in the sense
> that something doesn't *work*, just something for "optimized PCIe
> performance"?

Another way of saying that large requests surprisingly perform
worse than small requests.

> What are we supposed to do with this? Add similar quirks for every
> random PCI controller? Scratching my head about what this means for
> the future.

Ideally when the platform knows about these corner cases the BIOS
deploys the setting and the OS knows to leave it alone.

> What bad things happen if we *don't* do this? Is this something we
> could/should rely on BIOS to configure for us?

Reduced performance, and yes only the BIOS has a chance to know about
these niche corner cases ahead of time. The problem, as always, is when
to know when to step in and change what look to be default values, and
when the default values are deliberate choices by platform firmware that
knows a one-off detail.

So I agree with you that while this quirk meets the letter of this
specific recommendation, it portends a future of a steady stream of odd
host PCI controller quirks. Is there a path to empower platform firmware
to convey, "don't touch this value for 'reasons'"?

This reminds me of your observation about _HPX.
http://lore.kernel.org/20240715214529.GA447149@bhelgaas

I.e. potentially a path for Linux to double check that what it thinks is
a good value is countermanded by an _HPX record. Maybe that is overkill
and a more tightly scoped, "don't touch root port PCIe performance
settings" flag variable in ACPI would suffice? So I see this quirk as a
conversation starter that can be applied or held out until the
conversation resolves.