Re: [PATCH v5 05/16] PCI/AER: Add CXL PCIe Port correctable error support in AER service driver

From: Gregory Price
Date: Thu Feb 06 2025 - 13:37:45 EST


On Tue, Jan 07, 2025 at 08:38:41AM -0600, Terry Bowman wrote:
> The AER service driver supports handling Downstream Port Protocol Errors in
> Restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
> functionality for CXL PCIe Ports operating in Virtual Hierarchy (VH)
> mode.[1]
>
> CXL and PCIe Protocol Error handling have different requirements that
> necessitate a separate handling path. The AER service driver may try to
> recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
> suitable for CXL PCIe Port devices because of potential for system memory
> corruption. Instead, CXL Protocol Error handling must use a kernel panic
> in the case of a fatal or non-fatal UCE. The AER driver's PCIe Protocol
> Error handling does not panic the kernel in response to a UCE.
>

Naive question: is a panic actually required if the memory is a userland
resource?

The code in arch/x86/kernel/cpu/mce/core.c suggests we may not panic
if an uncorrectable error occurs in this fashion, but simply a SIGBUS.

Unless this is down the wrong pipe - in which case disregard.

I'm still digging through background on this patch set so I may be
barking up the wrong tree.

~Gregory