Re: [PATCH v2] cxl/core/port: defer endpoint probes when ACPI likely hasn't finished
From: Gregory Price
Date: Tue Oct 08 2024 - 21:01:32 EST
On Tue, Oct 08, 2024 at 03:37:45PM -0700, Dan Williams wrote:
> Gregory Price wrote:
> > On Fri, Oct 04, 2024 at 05:08:03PM -0700, Dan Williams wrote:
> > > Gregory Price wrote:
> > > > In cxl_acpi_probe, we add dports and uports to host bridges iteratively:
> > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport);
> > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport);
> > > >
> > > > Simultaneously, as ports are probed, memdev endpoints can also be
> > > > probed. This creates a race condition, where an endpoint can perceive
> > > > its path to the root being broken in devm_cxl_enumerate_ports.
> > > >
> > > > The memdev/endpoint probe will see a heirarchy that may look like:
> > > > mem1
> > > > parent => 0000:c1:00.0
> > > > parent => 0000:c0:01.1
> > > > parent->parent => NULL
> > > >
> > > > This results in find_cxl_port() returning NULL (since the port hasn't
> > > > been associated with the host bridge yet), and add_port_attach_ep
> > > > fails because the grandparent's grandparent is NULL.
> > > >
> > > > When the latter condition is detected, the comments suggest:
> > > > /*
> > > > * The iteration reached the topology root without finding the
> > > > * CXL-root 'cxl_port' on a previous iteration, fail for now to
> > > > * be re-probed after platform driver attaches.
> > > > */
> > > >
> > > > This case results in an -ENXIO; however, a re-probe never occurs. Change
> > > > this return condition to -EPROBE_DEFER to explicitly cause a reprobe.
> > >
> > > Ok, thanks for the additional debug. Like we chatted on the CXL Discord
> > > I think this is potentially pointing to a bug in bus_rescan_devices()
> > > where it checks dev->driver without holding the lock.
> > >
> > > Can you give this fix a try to see if it also resolves the issue?
> > > Effectively, cxl_bus_rescan() is always needed in case the cxl_acpi
> > > driver loads waaaay after deferred probing has given up, and if this
> > > works then EPROBE_DEFER can remain limited to cases where it is
> > > absolutely known that no other device_attach() kick is coming to save
> > > the day.
> > >
> >
> > Funny enough, not only did it not work, but now i get neither endpoint lol
> >
> > $ ls /sys/bus/cxl/devices/
> > decoder0.0 decoder1.0 decoder2.0 decoder3.1 mem0 port1 port3 root0
> > decoder0.1 decoder1.1 decoder3.0 decoder4.0 mem1 port2 port4
> >
> > w/ reprobe patch
> >
> > # ls /sys/bus/cxl/devices
> > decoder0.0 decoder1.0 decoder2.0 decoder3.1 decoder5.0 decoder6.0 endpoint5 mem0 port1 port3 root0
> > decoder0.1 decoder1.1 decoder3.0 decoder4.0 decoder5.1 decoder6.1 endpoint6 mem1 port2 port4
>
> Such a violent result is interesting! While I would have preferred an
> "all fixed!" version of "interesting", making something fail reliably and
> completely is at least indication that the starting point was more
> fragile than expected.
>
> Now, I tried to get cxl_test to fail by probing memdevs asynchronously
> alongside the ACPI root driver. That did reveal a use-after-free bug
> when out-of-order shutdown causes some cleanup to be skipped, will send
> a separate fixup for that, but it failed to reproduce the bug you are
> seeing.
>
> The incremental fix here, that applies on top of the device_attach()
> fixup, is to make sure that all cxl_port instances registered by
> cxl_acpi_probe() are active before cxl_bus_rescan() runs. That can only
> be guaranteed when the cxl_port driver is pre-loaded.
>
:< unfortunately the result is... the same but also different?
the worst kind of result
# ls /sys/bus/cxl/devices/
decoder0.0 decoder0.2 decoder2.0 decoder3.1 mem0 port1 port3 root0
decoder0.1 decoder1.0 decoder3.0 decoder4.0 mem1 port2 port4
apparently sometimes we get decoder0.2 and sometimes we get decoder1.1
after a reboot we get it the other way
# ls /sys/bus/cxl/devices/
decoder0.0 decoder0.2 decoder2.0 decoder3.1 mem0 port1 port3 root0
decoder0.1 decoder1.0 decoder3.0 decoder4.0 mem1 port2 port4
in my experimental kernel where everything "works" I get something like
# ls /sys/bus/cxl/devices/
dax_region0 decoder0.0 decoder1.0 decoder3.1 decoder5.1 endpoint5 mem1 port3 region1
dax_region1 decoder0.1 decoder2.0 decoder4.0 decoder6.0 endpoint6 port1 port4 region2
dax_region2 decoder0.2 decoder3.0 decoder5.0 decoder6.1 mem0 port2 region0 root0
which does not have decoder1.1 - but i haven't confirmed whether this is
consistent or not. I don't know what causes 0.N vs 1.M, maybe you do?
but in the first result (past email) you can see the reprobe patch also generated
decoder1.1 instead of decoder0.2
probably not related, more fun side quests!
Regardless, the additional patch did not resolve the problem :<
> If you are running from an initial ram disk make sure cxl_port.ko is
> included there...
>
> -- 8< --
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 82b78e331d8e..432b7cfd12a8 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -924,6 +924,13 @@ static void __exit cxl_acpi_exit(void)
>
> /* load before dax_hmem sees 'Soft Reserved' CXL ranges */
> subsys_initcall(cxl_acpi_init);
> +
> +/*
> + * Arrange for host-bridge ports to be active synchronous with
> + * cxl_acpi_probe() exit.
> + */
> +MODULE_SOFTDEP("pre: cxl_port");
> +
> module_exit(cxl_acpi_exit);
> MODULE_DESCRIPTION("CXL ACPI: Platform Support");
> MODULE_LICENSE("GPL v2");