Re: [PATCH 1/1] cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure

From: Bjorn Helgaas
Date: Mon Apr 29 2024 - 11:31:48 EST


On Sun, Apr 28, 2024 at 10:57:13PM -0700, PJ Waskiewicz wrote:
> On Tue, 2024-04-09 at 08:22 -0500, Bjorn Helgaas wrote:
> > On Sun, Apr 07, 2024 at 02:05:26PM -0700, ppwaskie@xxxxxxxxxx wrote:
> > > From: PJ Waskiewicz <ppwaskie@xxxxxxxxxx>
> > >
> > > Currently, Type 3 CXL devices (CXL.mem) can train using host CXL
> > > drivers on Emerald Rapids systems.  However, on some production
> > > systems from some vendors, a buggy BIOS exists that improperly
> > > populates the ACPI => PCI mappings.
> >
> > Can you be more specific about what this ACPI => PCI mapping is?
> > If you already know what the problem is, I'm sure this is obvious,
> > but
> > otherwise it's not.
>
> Apologies for the delay in response. Things got a bit busy with travel
> and whatnot...
>
> On one of these particular hosts, in /sys/bus/acpi/devices/ACPI0016:00,
> for example, the UID would be something like CX01. It isn't an u64 at
> all, and there's no atoi() or other conversions that would match what
> the UID should be.
>
> In my case, /sys/bus/acpi/devices/ACPI0016:02/ is my CXL device in
> question. The UID that is presented from enumeration was CX02.
> However, if I scour the CEDT manually, the UID of my particular CXL
> device is really UID 49.
>
> So, if I went from the PCI/CXL device side, and called something along
> the lines of to_cxl_host_bridge() and tried to go from the pci_dev to
> the acpi_handle, I'd get CX02 back. Then trying to use that to call
> acpi_table_parse_cedt() would fail.
>
> The BIOS fix from the vendor corrected the UID enumeration on the ACPI
> side. This allowed things to properly line up when traversing through
> the kernel APIs and parsing the ACPI tables.

IIUC, _HID ACPI0016 indicates a CXL host bridge. ACPI r6.5, sec
6.5.11, says "The _UID object is required in order to allow OSPM to
match entries in the CEDT to devices present in the ACPI namespace."

I don't see anything about a requirement to map an ACPI0016 devices to
a PCI device. At least in the non-CXL world, there *is* no way to map
a PNP0A08 device to a PCI device because a host bridge is not a PCI
devices itself (it has an unspecified non-PCI primary interface and a
PCI secondary interface).

So from the patch and the ACPI/CXL specs, it looks like the problem
doesn't involve PCI at all; it just looks like an ACPI0016 object is
required to contain a _UID, and on this buggy BIOS it doesn't.

My question was just prompted by the "ACPI => PCI mapping" in the
commit log. Since PCI doesn't seem involved, maybe just drop that
reference?

It's just a buggy BIOS that doesn't supply _UID for an ACPI0016
object, so you can't locate the corresponding CEDT entry, right?

Bjorn