Re: [PATCH v7 05/17] PCI/AER: Add CXL PCIe Port correctable error support in AER service driver
From: Dan Williams
Date: Tue Feb 11 2025 - 18:58:55 EST
Terry Bowman wrote:
> The AER service driver supports handling Downstream Port Protocol Errors in
> Restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
> functionality for CXL PCIe Ports operating in Virtual Hierarchy (VH)
> mode.[1]
>
> CXL and PCIe Protocol Error handling have different requirements that
> necessitate a separate handling path. The AER service driver may try to
> recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
> suitable for CXL PCIe Port devices because of potential for system memory
> corruption. Instead, CXL Protocol Error handling must use a kernel panic
> in the case of a fatal or non-fatal UCE. The AER driver's PCIe Protocol
> Error handling does not panic the kernel in response to a UCE.
>
> Introduce a separate path for CXL Protocol Error handling in the AER
> service driver. This will allow CXL Protocol Errors to use CXL specific
> handling instead of PCIe handling. Add the CXL specific changes without
> affecting or adding functionality in the PCIe handling.
>
> Make this update alongside the existing Downstream Port RCH error handling
> logic, extending support to CXL PCIe Ports in VH mode.
>
> Remove is_internal_error(). is_internal_error() was used to determine if
> an AER error was a CXL error. Instead, now rely on pcie_is_cxl_port() to
> indicate the error is a CXL error.
Wait, pcie_is_cxl_port() in isolation is insufficient, right? In other
words, I would expect that when the response may escalate to panic()
that the code should be reasonably certain that this *is* a CXL error.
At a minimum that is:
pcie_is_cxl_port() && is_internal_error()
...or am I missing something that it makes it unlikely that a standard
PCIe error or other internal error type will not be thrown by a
pcie_is_cxl_port() device?