Re: [PATCH] cxl/acpi: Defer probe when ACPI0016 PCI root bridge is not ready

From: Chen Pei

Date: Fri May 15 2026 - 09:57:31 EST


On Thu, 14 May 2026 15:31:11 +0800, Richard Cheng wrote:

> > On some platforms (e.g., RISC-V and ARM64) that use the generic
> > pci_acpi_scan_root() implementation, cxl_acpi_probe may run before
> > acpi_pci_root driver has bound to ACPI0016 (CXL host bridge) devices.
> > In this case, acpi_pci_find_root() returns NULL, causing
> > to_cxl_host_bridge() to skip the device silently. This results in
> > incomplete CXL port enumeration on first boot.
> >
> > Fix this by detecting the case where an ACPI0016 device exists but its
> > PCI root bridge is not yet ready, and returning -EPROBE_DEFER to trigger
> > a deferred probe retry.
> >
> > Signed-off-by: Chen Pei <cp0613@xxxxxxxxxxxxxxxxx>
> > ---
> > drivers/cxl/acpi.c | 26 ++++++++++++++++++++++++--
> > 1 file changed, 24 insertions(+), 2 deletions(-)
> >
>
> Hi Chen Pei,
>
> Thanks for the patch.
> I have a few questions and suggestions regarding to your changes.
>
> First of all I would like in which scenario did you encounter the bug?
> Any specific CONFIG options and the devices ? what's the error log ?
>
> It would be nice if you can attach it for us.

Hi Richard,

Thanks for the review.

I'm currently working on bringing up CXL support on the RISC-V QEMU
virt platform with ACPI (EDK2 UEFI firmware). This is still in the
early debugging/enabling stage.

During testing, I found that cxl_acpi (ACPI0017) probes before
acpi_pci_root has bound to the ACPI0016 (CXL host bridge) device.
RISC-V uses the generic pci_acpi_scan_root() implementation, where
the probe ordering of acpi_pci_root relative to cxl_acpi is not
guaranteed.

On x86, acpi_pci_root uses subsys_initcall and binds very early,
so this race does not manifest there.

This is a silent failure (no explicit error log), which makes it
particularly hard to diagnose. When cxl_acpi probes before
acpi_pci_root has bound the ACPI0016 device:
1. acpi_pci_find_root() returns NULL in to_cxl_host_bridge()
2. to_cxl_host_bridge() returns NULL
2. Both add_host_bridge_dport() and add_host_bridge_uport() return 0
(not an error), silently skipping the host bridge
3. cxl_acpi_probe() returns success, but the CXL port topology is
incomplete — no dports or uports are registered

The observable result after boot:
# memdev is visible but decoder hierarchy is missing
$ cxl list -M
[
{
"memdev":"mem0",
...
}
]

# No decoders or ports registered for the host bridge
$ cxl list -BDP
[]

# Workaround: manually unbind/bind triggers re-probe after
# acpi_pci_root is ready, and CXL topology enumerates correctly
$ echo ACPI0017:00 > /sys/bus/platform/drivers/cxl_acpi/unbind
$ echo ACPI0017:00 > /sys/bus/platform/drivers/cxl_acpi/bind

# After re-probe, full topology is available and CXL memory
# can be enabled/onlined successfully
$ cxl enable-memdev mem0
$ cxl create-region -m -t ram -d decoder0.0 -w 1 mem0 -s 4G
$ daxctl online-memory dax0.0

> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 127537628817..9952d0cff903 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -631,8 +631,21 @@ static int add_host_bridge_dport(struct device *match, void *arg)
> > struct acpi_pci_root *pci_root;
> > struct cxl_port *root_port = arg;
> > struct device *host = root_port->dev.parent;
> > - struct acpi_device *hb = to_cxl_host_bridge(host, match);
> > + struct acpi_device *adev = to_acpi_device(match);
> > + struct acpi_device *hb;
> >
> > + /*
> > + * If this is an ACPI0016 device but acpi_pci_find_root() hasn't
> > + * found the PCI root yet (driver not probed), defer the probe
> > + * to allow acpi_pci_root to bind first.
> > + */
> > + if (strcmp(acpi_device_hid(adev), "ACPI0016") == 0 &&
> > + !acpi_pci_find_root(adev->handle)) {
> > + dev_dbg(host, "deferring probe, ACPI0016 PCI root not ready\n");
> > + return -EPROBE_DEFER;
> > + }
>
> What about strncpy() here since we already know we're comparing against "ACPI0016" ?
> At the same time, why not just use "acpi_dev_hid_match()" ? it's widely used across
> numerous files.

Good point, thanks. I will switch to the acpi_dev_hid_match() in v2.

> > +
> > + hb = to_cxl_host_bridge(host, match);
> > if (!hb)
> > return 0;
> >
> > @@ -688,7 +701,8 @@ static int add_host_bridge_uport(struct device *match, void *arg)
> > {
> > struct cxl_port *root_port = arg;
> > struct device *host = root_port->dev.parent;
> > - struct acpi_device *hb = to_cxl_host_bridge(host, match);
> > + struct acpi_device *adev = to_acpi_device(match);
> > + struct acpi_device *hb;
> > struct acpi_pci_root *pci_root;
> > struct cxl_dport *dport;
> > struct cxl_port *port;
> > @@ -697,6 +711,14 @@ static int add_host_bridge_uport(struct device *match, void *arg)
> > resource_size_t component_reg_phys;
> > int rc;
> >
> > + /* Same deferral check as in add_host_bridge_dport() */
> > + if (strcmp(acpi_device_hid(adev), "ACPI0016") == 0 &&
> > + !acpi_pci_find_root(adev->handle)) {
> > + dev_dbg(host, "deferring probe, ACPI0016 PCI root not ready\n");
> > + return -EPROBE_DEFER;
> > + }
> > +
> > + hb = to_cxl_host_bridge(host, match);
> > if (!hb)
> > return 0;
> >
> > --
> > 2.50.1
> >
> >
>
> These 2 checks are basically the same, can we put it in a static inline helper or
> a macro if possible? something like the following might be better
>
> ```
> static int cxl_acpi_defer_host_bridge(struct device *host,
> struct acpi_device *adev)
> {
> if (acpi_dev_hid_match(adev, "ACPI0016") &&
> !acpi_pci_find_root(adev->handle)) {
> dev_dbg(host, "deferring probe, ACPI0016 PCI root not ready\n");
> return -EPROBE_DEFER;
> }
> return 0;
> }
> ```
> and use it in your code like
>
> ```
> int rc = cxl_acpi_defer_host_bridge(host, adev);
> if (rc)
> return rc;
> ```

Agreed, will extract it into a helper in v2.

> Last but not least, have you run the kselftests of CXL ? some mock bridges
> are platform devices, not ACPI devices, you are using "to_acpi_device(match)", this
> is not a safe runtime check when "match" is a platform_device, the code will read the memory
> layout wrongly.

Good catch, I haven't run the CXL kselftests. You're right that the
mock bridges are platform devices and unconditionally calling
to_acpi_device(match) would be unsafe there.

I'll fix this in v2 by adding a dev_is_platform() guard in the
helper, so it only applies to real ACPI devices. I'll also run
the CXL kselftests to validate before sending v2.

Thanks,
Pei