Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command

From: Ira Weiny
Date: Fri Sep 09 2022 - 16:54:08 EST


On Thu, Sep 08, 2022 at 01:52:40PM +0100, Jonathan Cameron wrote:
>

[snip]

> > > > diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> > > > new file mode 100644
> > > > index 000000000000..f4baeae66cf3
> > > > --- /dev/null
> > > > +++ b/include/trace/events/cxl-events.h
> > > > @@ -0,0 +1,127 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +#undef TRACE_SYSTEM
> > > > +#define TRACE_SYSTEM cxl_events
> > > > +
> > > > +#if !defined(_CXL_TRACE_EVENTS_H) || defined(TRACE_HEADER_MULTI_READ)
> > > > +#define _CXL_TRACE_EVENTS_H
> > > > +
> > > > +#include <linux/tracepoint.h>
> > > > +
> > > > +#define EVENT_LOGS \
> > > > + EM(CXL_EVENT_TYPE_INFO, "Info") \
> > > > + EM(CXL_EVENT_TYPE_WARN, "Warning") \
> > > > + EM(CXL_EVENT_TYPE_FAIL, "Failure") \
> > > > + EM(CXL_EVENT_TYPE_FATAL, "Fatal") \
> > > > + EMe(CXL_EVENT_TYPE_MAX, "<undefined>")
> > >
> > > Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
> > > dynamic capacity events so I guess it doesn't matter.
> >
> > I'm not sure why you would say that. I anticipate some user space daemon
> > requiring these events to set things up.
>
> Certainly a possible solution. I'd kind of expect a more hand shake based approach
> than a tracepoint. Guess we'll see :)

Yea I think we should wait an see.

>
>
> > >
> > > > + { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "Performance Degraded" }, \
> > > > + { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "Hardware Replacement Needed" } \
> > > > +)
> > > > +
> > > > +TRACE_EVENT(cxl_event,
> > > > +
> > > > + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > > > + struct cxl_event_record_raw *rec),
> > > > +
> > > > + TP_ARGS(dev_name, log, rec),
> > > > +
> > > > + TP_STRUCT__entry(
> > > > + __string(dev_name, dev_name)
> > > > + __field(int, log)
> > > > + __array(u8, id, UUID_SIZE)
> > > > + __field(u32, flags)
> > > > + __field(u16, handle)
> > > > + __field(u16, related_handle)
> > > > + __field(u64, timestamp)
> > > > + __array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > > > + __field(u8, length)
> > >
> > > Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
> > > (only noticed because I happen to have that spec revision open rather than 2.0).
> >
> > Yes done.
> >
> > There is some discussion with Dan regarding not decoding anything and letting
> > user space take care of it all. I think this shows a valid reason Dan
> > suggested this.
>
> I like being able to print tracepoints with out userspace tools.
> This also enforces structure and stability of interface which I like.

I tend to agree with you.

>
> Maybe a raw tracepoint or variable length trailing buffer to pass
> on what we don't understand?

I've already realized that we need to print all reserved fields for this
reason. If there is something the kernel does not understand user space can
just figure it out on it's own.

Sound reasonable?

Ira

>
> Jonathan
>
>