Re: [PATCH v7 15/17] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers

From: Dan Williams
Date: Thu Feb 13 2025 - 21:30:08 EST


Terry Bowman wrote:
> pci_driver::cxl_err_handlers are not currently assigned handler callbacks.
> The handlers can't be set in the pci_driver static definition because the
> CXL PCIe Port devices are bound to the portdrv driver which is not CXL
> driver aware.
>
> Add cxl_assign_port_error_handlers() in the cxl_core module. This
> function will assign the default handlers for a CXL PCIe Port device.
>
> When the CXL Port (cxl_port or cxl_dport) is destroyed the device's
> pci_driver::cxl_err_handlers must be set to NULL indicating they should no
> longer be used.
>
> Create cxl_clear_port_error_handlers() and register it to be called
> when the CXL Port device (cxl_port or cxl_dport) is destroyed.

This is another complication that naturally goes away with
cxl_error_handlers are instances that get attached to 'struct
cxl_driver' instances rather tha 'struct pci_driver' instances.

>
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Reviewed-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> Reviewed-by: Gregory Price <gourry@xxxxxxxxxx>
> ---
> drivers/cxl/core/pci.c | 59 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index f154dcf6dfda..03ae21a944e0 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -860,8 +860,39 @@ static pci_ers_result_t cxl_port_error_detected(struct pci_dev *pdev)
> return __cxl_handle_ras(dev, &pdev->dev, ras_base);
> }
>
> +static const struct cxl_error_handlers cxl_port_error_handlers = {
> + .error_detected = cxl_port_error_detected,
> + .cor_error_detected = cxl_port_cor_error_detected,
> +};
> +
> +static void cxl_assign_port_error_handlers(struct pci_dev *pdev)
> +{
> + struct pci_driver *pdrv;
> +
> + if (!pdev || !pdev->driver || !get_device(&pdev->dev))
> + return;
> +
> + pdrv = pdev->driver;
> + pdrv->cxl_err_handler = &cxl_port_error_handlers;

Nothing is holding the @pdev device_lock(), so @pdev->driver may go NULL
immediately after reading it.

Also, it is possible for a 'struct cxl_port' to exist even though its
uport_dev (pci_dev) is not attached to a driver. This would seem to
result in unpredictable behavior from one kernel to the next as the PCIe
portdrv situation evolves.

Lastly, I do not like the precedent of not being able to read a 'struct
pci_driver' template and be assured that it captures all possible error
handlers, or even worse, this unceremoniously overrides a PCI driver
that thinks it knows what the CXL error handlers should be.