Re: [PATCH] sched_ext: Add trace point to track sched_ext core events

From: Changwoo Min
Date: Thu Feb 27 2025 - 05:25:38 EST


On 25. 2. 27. 17:19, Andrea Righi wrote:
On Thu, Feb 27, 2025 at 05:05:54PM +0900, Changwoo Min wrote:
Hi Andrea,

Thank you for the review!

On 25. 2. 27. 16:38, Andrea Righi wrote:
Hi Changwoo,

On Wed, Feb 26, 2025 at 11:33:27PM +0900, Changwoo Min wrote:
Add tracing support, which may be useful for debugging sched_ext schedulers
that trigger a certain event.

Signed-off-by: Changwoo Min <changwoo@xxxxxxxxxx>
---
include/trace/events/sched_ext.h | 21 +++++++++++++++++++++
kernel/sched/ext.c | 4 ++++
2 files changed, 25 insertions(+)

diff --git a/include/trace/events/sched_ext.h b/include/trace/events/sched_ext.h
index fe19da7315a9..88527b9316de 100644
--- a/include/trace/events/sched_ext.h
+++ b/include/trace/events/sched_ext.h
@@ -26,6 +26,27 @@ TRACE_EVENT(sched_ext_dump,
)
);
+TRACE_EVENT(sched_ext_add_event,
+ TP_PROTO(const char *name, int offset, __u64 added),
+ TP_ARGS(name, offset, added),
+
+ TP_STRUCT__entry(
+ __string(name, name)
+ __field( int, offset )
+ __field( __u64, added )
+ ),
+
+ TP_fast_assign(
+ __assign_str(name);
+ __entry->offset = offset;
+ __entry->added = added;
+ ),
+
+ TP_printk("name %s offset %d added %llu",
+ __get_str(name), __entry->offset, __entry->added
+ )
+);

Isn't the name enough to determine which event has been triggered? What are
the benefits of reporting also the offset within struct scx_event_stats?


@name and @offset are duplicated information. However, I thought
having two is more convenient from the users' point of view
because they have different pros and cons.

@offset is quick to compare and can be used easily in the BPF
code, but the offset of an event can change across kernel
versions when new events are added. @offset would be good to
write a quick trace hook for debugging.

On the other hand, @name won't change across kernel versions,
which is good. However, it requires more code to acutally read
the string in the BPF code (__data_loc for string is a 32-bit
integer encoding string length and location).

Does it make sense to you?


So, IMHO @offset to me would make sense if we guarantee that it won't
change across kernel versions, and that's probably doable, we just need to
make sure that we always add new events at the bottom of scx_event_stats.

Keeping the offset across versions is possible if we add new
events to the bottom. However, I am not sure if that is what we
want because we lose the nice logical grouping of the events in
the scx_event_stats struct.

Otherwise there's the risk to break potential users of this tracepoint that
may consider the offset like a portable ID.

Hmm... I agree. The @offset would be too low level and could the
potential source of confusion.

Maybe we can call it @id or @event_id or similar and guarantee its
portability? What do you think?

Now I think dropping @offset would be better in the long run
because we can maintain scx_event_stats clean and do not create
a source of confusion. Regarding the ease of using @name, adding
an code example in the commit message will suffice, something
like this:

struct tp_add_event {
struct trace_entry ent;
u32 __data_loc_name;
u64 delta;
};

SEC("tracepoint/sched_ext/sched_ext_add_event")
int tp_add_event(struct tp_add_event *ctx)
{
char event_name[128];
unsigned short offset = ctx->__data_loc_name & 0xFFFF;
bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset);

bpf_printk("name %s delta %llu", event_name, ctx->delta);
return 0;
}

The downside of not having a numerical ID (@offset or @event_id)
is the cost of string comparison to distinguish an event type. If
we assume the probing the event is rare, it will be okay.

@Tejun, @Andrea -- What do you think? Should we provide
a portability-guaranteed @event_id after dropping @offset? Or
would it be more than sufficient to have a string-type event name?

Regards,
Changwoo Min