Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

From: Ira Weiny
Date: Tue Dec 19 2023 - 12:17:28 EST


Ira Weiny wrote:
> Dan Williams wrote:
> > Smita Koralahalli wrote:
> > > On 12/15/2023 3:26 PM, Ira Weiny wrote:

[snip]

> > > I remember Dan pointing out to me this when I sent decoding for protocol
> > > errors and its still pending on me for protocol errors.
> >
> > Good point, so I think the responsibility to trace CXL events should
> > belong to ghes_do_proc() and ghes_print_estatus() can just ignore CXL
> > events.
> >
> > Notice how ghes_proc() sometimes skips ghes_print_estatus(), but
> > uncoditionally emits a trace event in ghes_do_proc()? To me that means
> > that the cper_estatus_print() inside ghes_print_estatus() can just defer
> > to the ghes code to do the hookup to the trace code.
> >
> > For example, ras_userspace_consumers() was introduced to skip emitting
> > events to the kernel log when the trace event might be handled. My
> > assumption is that was for historical reasons, but since CXL events are
> > new, just never emit them to the kernel log and always require the trace
> > path.
> >
> > I am open to other thoughts here, but it seems like ghes_do_proc() is
> > where the callback needs to be triggered.
>
> I see.
>
> Ok. I'll create a pre-patch which moves the protocol error first then
> I'll put the events in the ghes_do_proc() well.
>

Apologies. I really wanted to make this work a pre-cursor patch but I
see that there is not a trace point for the protocol errors yet. So as
not to slow the progress of this work I'm going to skip moving the
protocol stuff right now.

Also, as part of this work I think moving the CXL specific defines into
the common linux/cper.h is appropriate at this time.

Unless I hear otherwise I'm going to land the event stuff in that common
header and we can move the protocol error defines later.

Thanks again for all the testing,
Ira