Re: [PATCH 1/3] efi/cper, cxl: Decode CXL Component Events General Media Event Record

From: Dan Williams
Date: Wed Oct 18 2023 - 14:48:32 EST


Ard Biesheuvel wrote:
> On Tue, 17 Oct 2023 at 22:52, Smita Koralahalli
> <Smita.KoralahalliChannabasappa@xxxxxxx> wrote:
> >
> > Hi Ira,
> >
> > On 10/12/2023 5:25 PM, Ira Weiny wrote:
> > > Smita Koralahalli wrote:
> > >> Add support for decoding CXL Component Events General Media Event Record
> > >> as defined in CXL rev 3.0 section 8.2.9.2.1.1.
> > >>
> > >> All the event records as defined in Event Record Identifier field of the
> > >> Common Event Record Format in CXL rev 3.0 section 8.2.9.2.1 follow the
> > >> CPER format for representing the hardware errors if reported by a
> > >> platform.
> > >>
> > >> According to the CPER format, each event record including the General
> > >> Media is logged as a CXL Component Event as defined in UEFI 2.10
> > >> Section N.2.14 and is identified by a UUID as defined by Event Record
> > >> Identifier field in Common Event Record Format of CXL rev 3.0 section
> > >> 8.2.9.2.1. CXL Component Event Log field in Component Events Section
> > >> corresponds to the component/event specified by the section type UUID.
> > >>
> > >> Add support for decoding CXL Component Events as defined in UEFI 2.10
> > >> Section N.2.14 and decoding Common Event Record as defined in CXL rev 3.0
> > >> section 8.2.9.2.1.
> > >>
> > >> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
> > >
> > > [snip]
> > >
> > >> +
> > >> +/*
> > >> + * Compute Express Link Common Event Record
> > >> + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > >> + */
> > >> +struct common_event_record {
> > >> + u8 identifier[16];
> > >
> > > I interpreted the CPER structure as not having this identifier here.
> > >
> > > From Section N.2.14:
> > >
> > > "For the CXL Component Event Log: Refer to the Common Event Record field
> > > (Offset 16) of the Events Record Format for each CXL component."
> > >
> > > This implies that the data coming from the CPER record starts at length.
> >
> > Thanks for pointing this out. According to Spec, you are right. Our
> > records did show up the identifier. Hence, I added that field. We should
> > probably fix it from our end.
> >
> > Meanwhile, I'm taking a look at your patches.
> >
>
> Thanks
>
> Once you've compared notes, can you please let me know how to proceed?
> I will not consider Ira's patches or yours for merging before that.

Ard, I have a higher-level question. If these CPER records get routed to
the CXL subsystem to be emitted by code that is shared with the
native-driver record-retrieval mechanism, is there still motivation to
parse and emit them to the kernel log in the EFI code?

The difference with these CXL memory error records vs other records like
CPER_SEC_PLATFORM_MEM is that the record contains device-local addresses
that need translation to be of use to other system-software and that
translation needs topology information that the CXL subsystem has
enumerated.

So, my proposal is that the final form of this enabling would emit the
record for CXL to consume, but otherwise not emit it via printk().