Re: [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery

From: Alison Schofield

Date: Tue Nov 11 2025 - 03:37:43 EST


On Tue, Nov 04, 2025 at 11:03:03AM -0600, Terry Bowman wrote:
> Implement cxl_do_recovery() to handle uncorrectable protocol
> errors (UCE), following the design of pcie_do_recovery(). Unlike PCIe,
> all CXL UCEs are treated as fatal and trigger a kernel panic to avoid
> potential CXL memory corruption.
>
> Add cxl_walk_port(), analogous to pci_walk_bridge(), to traverse the
> CXL topology from the error source through downstream CXL ports and
> endpoints.
>
> Introduce cxl_report_error_detected(), mirroring PCI's
> report_error_detected(), and implement device locking for the affected
> subtree. Endpoints require locking the PCI device (pdev->dev) and the
> CXL memdev (cxlmd->dev). CXL ports require locking the PCI
> device (pdev->dev) and the parent CXL port.
>
> The device locks should be taken early where possible. The initially
> reporting device will be locked after kfifo dequeue. Iterated devices
> will be locked in cxl_report_error_detected() and must lock the
> iterated devices except for the first device as it has already been
> locked.
>
> Export pci_aer_clear_fatal_status() for use when a UCE is not present.
>
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
>
> ---
>
snip

> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 5bc144cde0ee..52c6f19564b6 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c

snip

> +static int cxl_report_error_detected(struct device *dev, void *data, struct pci_dev *err_pdev)
> +{
> + bool need_lock = (dev != &err_pdev->dev);
> + pci_ers_result_t vote, *result = data;
> + struct pci_dev *pdev;
> +
> + if (!dev || !dev_is_pci(dev))
> + return 0;
> + pdev = to_pci_dev(dev);
> +
> + device_lock_if(&pdev->dev, need_lock);
> + if (is_pcie_endpoint(pdev) && !cxl_pci_drv_bound(pdev)) {
> + device_unlock_if(&pdev->dev, need_lock);
> + return PCI_ERS_RESULT_NONE;

sparse warns:
drivers/cxl/core/ras.c:316:24: warning: incorrect type in return expression (different base types)
drivers/cxl/core/ras.c:316:24: expected int
drivers/cxl/core/ras.c:316:24: got restricted pci_ers_result_t