Re: [PATCH v8 2/6] cxl/trace: Add TRACE support for CXL media-error records

From: Alison Schofield
Date: Mon Mar 13 2023 - 20:21:51 EST


On Mon, Mar 13, 2023 at 03:47:35PM -0700, Ira Weiny wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@xxxxxxxxx>
> >
> > CXL devices may support the retrieval of a device poison list.
> > Add a new trace event that the CXL subsystem may use to log
> > the media-error records returned in the poison list.
> >
> > Log each media-error record as a trace event of type 'cxl_poison'.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > ---
> > drivers/cxl/core/mbox.c | 4 +-
> > drivers/cxl/core/trace.h | 84 ++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 87 insertions(+), 1 deletion(-)

snip

> >
> > +#define cxl_poison_overflow(flags, time) \
> > + (flags & CXL_POISON_FLAG_OVERFLOW ? le64_to_cpu(time) : 0)
> > +
> > +TRACE_EVENT(cxl_poison,
> > +
> > + TP_PROTO(struct cxl_memdev *cxlmd, struct cxl_region *region,
> > + const struct cxl_poison_record *record,
> > + u8 flags, __le64 overflow_t),
>
> FWIW I made event overflow a separate trace event.
>
> Will this make all of the poisons in a single GetPoison command marked
> with overflow in the trace buffer?

Yes. Every record returned within a poison payload, gets the same flags and
overflow_t reported in its trace events.

I took a peek at what you did. (Perhaps we should have called that
cxl_event_overflow). I don't think the poison reporting allows a
similar, singular overflow trace event. The overflow setting means
the device has overflowed its poison list and the list may be
incomplete. I think we repeat the overflow state on every cxl_poison
trace event until the overflow status goes away. (Scan Media)

Alison


>
> Ira
>
> > +
snip

>
>