Hello,
On Fri, Jan 17, 2025 at 12:15:37AM +0900, Changwoo Min wrote:
+/*
+ * Collection of event counters.
+ */
+struct scx_event_stat {
If we keep this struct, maybe name it scx_event_stats?
+ /*
+ * If ops.select_cpu() returns a CPU which can't be used by the task,
+ * the core scheduler code silently picks a fallback CPU.
+ */
+ u64 INVAL_SELECT_CPU;
Let's do $COMPONENT_$EVENT, so SELECT_CPU_INVALID (or maybe something more
specific than invalid, like disallowed or fallback?).
+#define SCX_EVENT_IDX(e) (offsetof(struct scx_event_stat, e)/sizeof(u64))
+#define SCX_EVENT_END_IDX() (sizeof(struct scx_event_stat)/sizeof(u64))
+#define SCX_EVENT_DEFINE(e) SCX_EVENT_##e = SCX_EVENT_IDX(e)
+
+/*
+ * Types of event counters.
+ */
+enum scx_event_kind {
+ SCX_EVENT_BEGIN = 0,
I think 0 BEGIN value can be implicit.
+ SCX_EVENT_DEFINE(INVAL_SELECT_CPU),
+ SCX_EVENT_END = SCX_EVENT_END_IDX(),
SCX_NR_EVENTS? Also, we don't really need to macro to count the enums.
This feels a bit odd to me. Why does it need both the struct fields and
indices? Can we do either one of those?
+static const char *scx_event_stat_str[] = {
+ [SCX_EVENT_INVAL_SELECT_CPU] = "invalid_select_cpu",
+};
and hopefully derive this through # macro arg expansion?
+/*
+ * The event counter is organized by a per-CPU variable to minimize the
+ * accounting overhead without synchronization. A system-wide view on the
+ * event counter is constructed when requested by scx_bpf_get_event_stat().
+ */
+static DEFINE_PER_CPU(struct scx_event_stat, event_stats);
May be better to include "cpu" in the variable name.
+/**
+ * scx_add_event - Increase an event counter for 'name' by 'cnt'
+ * @name: an event name defined in struct scx_event_stat
+ * @cnt: the number of the event occured
+ */
+#define scx_add_event(name, cnt) ({ \
+ struct scx_event_stat *__e; \
+ __e = get_cpu_ptr(&event_stats); \
+ WRITE_ONCE(__e->name, __e->name+ (cnt)); \
+ put_cpu_ptr(&event_stats); \
this_cpu_add()?
My bad. Will fix it.
@@ -3607,8 +3667,10 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
*ddsp_taskp = NULL;
if (ops_cpu_valid(cpu, "from ops.select_cpu()"))
return cpu;
- else
+ else {
+ scx_add_event(INVAL_SELECT_CPU, 1);
return prev_cpu;
+ }
formatting:
if () {
} else {
}
Also, I'm not sure this is a useful event to count. False ops_cpu_valid()
indicates that the returned CPU is not even possible and the scheduler is
ejected right away. What's more interesting is
kernel/sched/core.c::select_task_rq() tripping on !is_cpu_allowed() and
falling back using select_fallback_rq().
We can either hook into core.c or just compare the ops.select_cpu() picked
CPU against the CPU the task ends up on in enqueue_task_scx().
@@ -5053,6 +5115,15 @@ static void scx_ops_disable_workfn(struct kthread_work *work)
scx_rq_clock_invalidate(rq);
}
+ /*
+ * Clear event counters so the next scx scheduler always gets
+ * fresh event counter values.
+ */
+ for_each_possible_cpu(cpu) {
+ struct scx_event_stat *e = per_cpu_ptr(&event_stats, cpu);
+ memset(e, 0, sizeof(*e));
+ }
+
Wouldn't we want to keep these counters intact on ejection so that the
scheduler's ejection path can capture the counters if necessary? Resetting
on load probably is better.