Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command

From: Dave Jiang
Date: Tue Sep 20 2022 - 16:23:37 EST



On 9/20/2022 8:49 AM, Jonathan Cameron wrote:
On Fri, 9 Sep 2022 13:53:55 -0700
Ira Weiny <ira.weiny@xxxxxxxxx> wrote:

On Thu, Sep 08, 2022 at 01:52:40PM +0100, Jonathan Cameron wrote:
[snip]

diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
new file mode 100644
index 000000000000..f4baeae66cf3
--- /dev/null
+++ b/include/trace/events/cxl-events.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cxl_events
+
+#if !defined(_CXL_TRACE_EVENTS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _CXL_TRACE_EVENTS_H
+
+#include <linux/tracepoint.h>
+
+#define EVENT_LOGS \
+ EM(CXL_EVENT_TYPE_INFO, "Info") \
+ EM(CXL_EVENT_TYPE_WARN, "Warning") \
+ EM(CXL_EVENT_TYPE_FAIL, "Failure") \
+ EM(CXL_EVENT_TYPE_FATAL, "Fatal") \
+ EMe(CXL_EVENT_TYPE_MAX, "<undefined>")
Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
dynamic capacity events so I guess it doesn't matter.
I'm not sure why you would say that. I anticipate some user space daemon
requiring these events to set things up.
Certainly a possible solution. I'd kind of expect a more hand shake based approach
than a tracepoint. Guess we'll see :)
Yea I think we should wait an see.

+ { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "Performance Degraded" }, \
+ { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "Hardware Replacement Needed" } \
+)
+
+TRACE_EVENT(cxl_event,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_event_record_raw *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ __string(dev_name, dev_name)
+ __field(int, log)
+ __array(u8, id, UUID_SIZE)
+ __field(u32, flags)
+ __field(u16, handle)
+ __field(u16, related_handle)
+ __field(u64, timestamp)
+ __array(u8, data, EVENT_RECORD_DATA_LENGTH)
+ __field(u8, length)
Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
(only noticed because I happen to have that spec revision open rather than 2.0).
Yes done.

There is some discussion with Dan regarding not decoding anything and letting
user space take care of it all. I think this shows a valid reason Dan
suggested this.
I like being able to print tracepoints with out userspace tools.
This also enforces structure and stability of interface which I like.
I tend to agree with you.

Maybe a raw tracepoint or variable length trailing buffer to pass
on what we don't understand?
I've already realized that we need to print all reserved fields for this
reason. If there is something the kernel does not understand user space can
just figure it out on it's own.

Sound reasonable?
Hmm. Printing reserved fields would be unusual. Not sure what is done for similar
cases elsewhere, CPER records etc...

We could just print a raw array of the whole event as well as decode version, but
that means logging most of the fields twice...

Not nice either.

I'm a bit inclined to say we should maybe just ignore stuff we don't know about or
is there a version number we can use to decide between decoded vs decoded as much as
possible + raw log?

libtraceevent can pull the trace event data structure fields directly. So the raw data can be pulled directly from the kernel. And what gets printed to the trace buffer can be decoded data constructed from those fields by the kernel code. So with that you can have access both.


Jonathan

Ira

Jonathan