Re: [PATCH v7 08/11] cxl: Coordinate sibling functions for CXL reset
From: Alex Williamson
Date: Fri Jun 26 2026 - 18:09:10 EST
On Tue, 23 Jun 2026 16:00:23 -0700
"Dan Williams (nvidia)" <djbw@xxxxxxxxxx> wrote:
> Srirangan Madhavan wrote:
> > CXL Device Reset affects all CXL.cache and CXL.mem functions in the reset
> > scope. Lock same-scope siblings with pci_dev_trylock(), save/disable them,
> > drain pending transactions, and hold IOMMU reset blocks until recovery.
> >
> > Also include mem-capable siblings in HDM range validation and CPU cache
> > invalidation. Cache-only siblings are quiesced, but skipped for HDM range
> > handling.
>
> PCI reset locking and ordering is already a source of some burden
> without adding this new sibling model to consider.
>
> Is there evidence that multi-function CXL devices, where most of the
> functions are non-CXL, is going to be a common occurrence?
>
> In other words if CXL reset borrowed the bus reset locking model:
>
> if (pci_bus_trylock(bus)) {
> pci_bus_save_and_disable_locked(bus);
> might_sleep();
> rc = cxl_request_and_flush_hdm(bus);
> if (rc == 0) {
> rc = cxl_reset_execute(pdev);
> cxl_release_and_flush_hdm(bus);
> }
> pci_bus_restore_locked(bus);
> pci_bus_unlock(bus);
> }
>
> The cost is disturbing some non-CXL functions, the benefit is reusing an
> existing reset order / locking model.
I'd say further that this exceeds the boundaries of what
pci_reset_function(), or the @reset sysfs attribute per pci_dev, is
scoped to do. pci_reset_function() must limit the scope to the
pci_dev (and in this case the CXL state associated with only that
pci_dev). See for instance how bus and slot use cases through
pci_reset_function() are limited to non-multifunction devices.
For multiple functions, the precedent is something more like
pci_reset_bus(), where the caller is responsible for coordinating the
set of affected devices. The locking is still complicated, but at
least it's managed in vfio-pci-core, with a variant driver that
actually owns the device, rather than pci-core.
Also note that there's currently no mechanism for performing a
multi-function scoped reset through sysfs (excluding raw access to the
parent bridge that bypasses all save/restore mechanics). I'd suggest
that cxl_reset can only be available as a function scoped reset when
only function 0 supports cxl.mem or cxl.cache, but that may also lead
to the question of whether the reset sysfs attribute should be exposed
at all if it only resets the cxl.io state, for example via FLR. Thanks,
Alex