Re: [PATCH 0/2] efi/cper, cxl: Decode CXL Protocol Errors CPER

From: Smita Koralahalli
Date: Wed Oct 26 2022 - 15:31:43 EST


On 10/25/2022 5:11 PM, Dan Williams wrote:
Smita Koralahalli wrote:
Hi Dan,

On 10/21/2022 3:18 PM, Dan Williams wrote:
Hi Smita,

Smita Koralahalli wrote:
This series adds decoding for the CXL Protocol Errors Common Platform
Error Record.
Be sure to copy Ard Biesheuvel <ardb@xxxxxxxxxx>, added, on
drivers/firmware/efi/ patches.

Along those lines, drivers/cxl/ developers have an idea of what is
contained in the new CXL protocol error records and why Linux might want
to decode them, others from outside drivers/cxl/ might not. It always
helps to have a small summary of the benefit to end users of the
motivation to apply a patch set.
Sure, will include in my v2.

Smita Koralahalli (2):
efi/cper, cxl: Decode CXL Protocol Error Section
efi/cper, cxl: Decode CXL Error Log

drivers/firmware/efi/Makefile | 2 +-
drivers/firmware/efi/cper.c | 9 +++
drivers/firmware/efi/cper_cxl.c | 108 ++++++++++++++++++++++++++++++++
drivers/firmware/efi/cper_cxl.h | 58 +++++++++++++++++
include/linux/cxl_err.h | 21 +++++++
5 files changed, 197 insertions(+), 1 deletion(-)
I notice no updates for the trace events in ghes_do_proc(), is that next
in your queue? That's ok to be a follow-on after v2.
Sorry, if I haven't understood this right. Are you implying about the
"handling"
of cxl memory errors in ghes_do_proc() or is it just copying of CPER
entries to
tracepoints?
Right now ghes_do_proc() will let the CXL CPER records fall through to
log_non_standard_event(). Are you planning to add trace event decode
there for CPER_SEC_CXL_PROT_ERR records?

Thanks! Yeah its a good idea to add. I did not think about this before.
I will send this as a separate patchset after v2.

I think with this cxl cper trace event support and Ira's patchset which traces
specific event record types via Get Event Record, we can start the userspace
handling probably in rasdaemon?


I am not sure if the CXL CPER to trace record conversion belongs there,
or somewhere closer to trace_aer_event() invocations since the CXL
protocol errors are effectively an extenstion of PCI AER events.

Right, I will keep it simple in v1 and get the comments about the placement..

Thanks,
Smita