Re: [PATCH v7 17/17] cxl/pci: Handle CXL Endpoint and RCH Protocol Errors separately from PCIe errors
From: Dan Williams
Date: Thu Feb 13 2025 - 21:44:49 EST
Terry Bowman wrote:
> CXL Endpoint and Restricted CXL Host (RCH) Downstream Port Protocol Errors
> are currently treated as PCIe errors, which does not properly process CXL
> uncorrectable (UCE) errors. When a CXL device encounters an uncorrectable
> Protocol Error, the system should panic to prevent potential CXL memory
> corruption.
>
> Treat CXL Endpoint Protocol Errors as CXL errors. This requires updates in
> the CXL and AER drivers.
>
> Update the CXL Endpoint driver with a new declaration for struct
> cxl_error_handlers named cxl_ep_error_handlers. Move the existing CE and
> UCE handler assignments from cxl_error_handlers to the new
> cxl_ep_error_handlers. Remove the 'state' parameter from the UCE handler
> interface because it is not used in CXL recovery.
>
> Update the AER driver to associate CXL Protocol errors with CXL error
> handling. Change detection in handles_cxl_errors() from using
> pcie_is_cxl_port() to instead use pcie_is_cxl().
This all looks ok for what it is, but given the prior discussion about
cxl_error_handlers only running in the CXL domain I think this will
result in the cxl_pci driver having even less to do.
The cxl_core will default register port error handlers that can panic on
notification. The cxl_pci driver's only job is then responding to PCI
events and registering CXL objects to let the core handle.