Re: [PATCH 1/7] sched_ext: Implement event counter infrastructure and add an event

From: Changwoo Min
Date: Fri Jan 17 2025 - 19:27:40 EST


Hello,

On 25. 1. 18. 01:24, Tejun Heo wrote:
second approach is satisfactory regarding the BPF binary
compatibility by CO-RE. However, it is a little bit cumbersome to
iterate all the events, so I added the enums. I know the enums
are only usable in the kernel code due to the binary
compatibility, but I thought adding the enums rather than
bloating the scx_dump_state() code at the end would be better.
Does it make sense to you?

Let's just use the structs and open code dumping. We can pack the dump
output better that way too and I don't think BPF scheds would have much use
for enum iterations.

Currently, enums are not exposed to BPF schedulers. Also, BPF
schedulers should not rely on the enums because the enums are
coupled with a specific layout of the struct, so that would cause
a BPF binary compatibility problem.

Could you explain more on the code dumping? I want at least print
the event string for easier interpretation.

...
+/**
+ * scx_add_event - Increase an event counter for 'name' by 'cnt'
+ * @name: an event name defined in struct scx_event_stat
+ * @cnt: the number of the event occured
+ */
+#define scx_add_event(name, cnt) ({ \
+ struct scx_event_stat *__e; \
+ __e = get_cpu_ptr(&event_stats); \
+ WRITE_ONCE(__e->name, __e->name+ (cnt)); \
+ put_cpu_ptr(&event_stats); \

this_cpu_add()?

That's handier. I will change it.

Note that there's __this_cpu_add() too which can be used from sections of
code that already disables preemption. On x86, both variants cost the same
but on arm __this_cpu_add() is cheaper as it can skip preemption off/on.
Given that most of these counters are modified while holding rq lock, it may
make sense to provide __ version too.

Sounds good. I will add __scx_add_event() following the
convention of this_cpu_add() and __this_cpu_add().


...
Also, I'm not sure this is a useful event to count. False ops_cpu_valid()
indicates that the returned CPU is not even possible and the scheduler is
ejected right away. What's more interesting is
kernel/sched/core.c::select_task_rq() tripping on !is_cpu_allowed() and
falling back using select_fallback_rq().

We can either hook into core.c or just compare the ops.select_cpu() picked
CPU against the CPU the task ends up on in enqueue_task_scx().

Modifying core.c will be more direct and straightforward. Also,
I think it would be better to separate this commit into two: one
for the infra-structure and another for the SELECT_CPU_FALLBACK
event, which touches core.c. I will move the necessary code in
the infrastructure into kernel/sched/sched.h, so we can use
scx_add_event() in core.c.

Hmm.... yeah, both approaches have pros and cons but I kinda like the idea
of restricting the events within ext.c and here detecting the fallback
condition is pretty trivial. I don't know.

Right, it will be possible track the same thing without touch
core.c (that's too much) by replicating the conditions in core.c
to ext.c. I will dig the code more to figure out what the best
approach is.

Regards,
Changwoo Min