Re: [PATCH v8 0/2] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors

From: Dave Jiang
Date: Thu Mar 13 2025 - 15:58:42 EST




On 3/10/25 3:38 PM, Smita Koralahalli wrote:
> This patchset adds logging support for CXL CPER endpoint and port Protocol
> errors.
>
> Based on top of cxl-next.

Applied to cxl/next with change suggested by Ming.


>
> Link to v7:
> https://lore.kernel.org/linux-cxl/20250226221157.149406-1-Smita.KoralahalliChannabasappa@xxxxxxx
>
> Changes in v7 -> v8:
> [Yazen]: Moved guard() after !pdev check.
> [Ira]: Included ras.o in test build file.
> [Alison]: Naming consistency: devname -> device, parent -> host.
>
> Changes in v6 -> v7:
> Reworked to move registration and protocol error handling into a new
> file inside CXL core. (cxl/core/ras.c).
>
> Changes in v5 -> v6:
> [Dave, Jonathan, Ira]: Reviewed-by tags.
> [Dave]: Check for cxlds before assigning fe.
> Merge one of the patches (Port error trace logging) from Terry's Port
> error handling.
> Rename host -> parent.
>
> Changes in v4 -> v5:
> [Dave]: Reviewed-by tags.
> [Jonathan]: Remove blank line.
> [Jonathan, Ira]: Change CXL -> "CXL".
> [Ira]: Fix build error for CONFIG_ACPI_APEI_PCIEAER.
>
> Changes in v3 -> v4:
> [Ira]: Use memcpy() for RAS Cap struct.
> [Jonathan]: Commit description edits.
> [Jonathan]: Use separate work registration functions for protocol and
> component errors.
> [Jonathan, Ira]: Replace flags with separate functions for port and
> device errors.
> [Jonathan]: Use goto for register and unregister calls.
>
> Changes in v2 -> v3:
> [Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
> reusing existing workqueue which handles CXL CPER events.
> [Dan] Update function and struct names.
> [Ira] Don't define common function get_cxl_devstate().
> [Dan] Use switch cases rather than defining array of structures.
> [Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
> [Dan] Use pr_err_ratelimited().
> [Dan] Use AER_ severities directly. Don't define CXL_ severities.
> [Dan] Limit either to Device ID or Agent Info check.
> [Dan] Validate size of RAS field matches expectations.
>
> Changes in v2 -> v1:
> [Jonathan] Refactor code for trace support. Rename get_cxl_dev()
> to get_cxl_devstate().
> [Jonathan] Cleanups for get_cxl_devstate().
> [Alison, Jonathan]: Define array of structures for Device ID and Serial
> number comparison.
> [Dave] p_err -> rec/p_rec.
> [Jonathan] Remove pr_warn.
>
> Smita Koralahalli (2):
> acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
> cxl/pci: Add trace logging for CXL PCIe Port RAS errors
>
> drivers/acpi/apei/ghes.c | 49 +++++++++++++++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/core.h | 3 +
> drivers/cxl/core/port.c | 7 +++
> drivers/cxl/core/ras.c | 123 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/trace.h | 47 +++++++++++++++
> include/cxl/event.h | 15 +++++
> tools/testing/cxl/Kbuild | 1 +
> 8 files changed, 246 insertions(+)
> create mode 100644 drivers/cxl/core/ras.c
>