RE: [PATCH v2 1/3] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers

From: Dan Williams
Date: Mon Aug 07 2023 - 23:18:38 EST


Smita Koralahalli wrote:
> According to Section 9.17.2, Table 9-26 of CXL Specification [1], owner
> of AER should also own CXL Protocol Error Management as there is no
> explicit control of CXL Protocol error. And the CXL RAS Cap registers
> reported on Protocol errors should check for AER _OSC rather than CXL
> Memory Error Reporting Control _OSC.
>
> The CXL Memory Error Reporting Control _OSC specifically highlights
> handling Memory Error Logging and Signaling Enhancements. These kinds of
> errors are reported through a device's mailbox and can be managed
> independently from CXL Protocol Errors.
>
> This change fixes handling and reporting CXL Protocol Errors and RAS
> registers natively with native AER and FW-First CXL Memory Error Reporting
> Control.

I feel like this could be said more succinctly and with an indication of
what the end user should expect to see. Something like:

"cxl_pci fails to unmask CXL protocol errors when CXL memory error
reporting is not granted native control. Given that CXL memory error
reporting uses the event interface and protocol errors use AER, unmask
protocol errors based only on the native AER setting. Without this
change end user deployments will fail to report protocol errors in the
case where native memory error handling is not granted to Linux."

>
> [1] Compute Express Link (CXL) Specification, Revision 3.1, Aug 1 2022.
>
> Fixes: 248529edc86f ("cxl: add RAS status unmasking for CXL")
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
> ---
> v2:
> Added fixes tag.
> Included what the patch fixes in commit message.
> ---
> drivers/cxl/pci.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 1cb1494c28fe..2323169b6e5f 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -541,9 +541,9 @@ static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> return 0;
> }
>
> - /* BIOS has CXL error control */
> - if (!host_bridge->native_cxl_error)
> - return -ENXIO;
> + /* BIOS has PCIe AER error control */
> + if (!host_bridge->native_aer)
> + return 0;

The error code does not matter here and changing it makes the patch that
bit much more noisier than it needs to be. So just leave it as:

return -ENXIO;

>
> rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
> if (rc)
> --
> 2.17.1
>