Re: [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches
From: Zicheng Qu
Date: Wed May 27 2026 - 22:58:19 EST
On Wed, May 27, 2026 at 07:26PM +0800, Peter Zijlstra wrote:
This is truly horrible. We have 4 class methods involved with switchingYes, from the class switch point of view, `switching_to_fair()` is a better
classes and you stick in a random call in a place that is called when no
class is changed.
Would not something like this work?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 62a2dcb0d03e..a2eb43bd73b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -14957,6 +14957,11 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
detach_task_cfs_rq(p);
}
+static void switching_to_fair(struct rq *rq, struct task_struct *p)
+{
+ set_load_weight(p, false);
+}
+
static void switched_to_fair(struct rq *rq, struct task_struct *p)
{
WARN_ON_ONCE(p->se.sched_delayed);
@@ -15351,6 +15356,7 @@ DEFINE_SCHED_CLASS(fair) = {
.prio_changed = prio_changed_fair,
.switching_from = switching_from_fair,
.switched_from = switched_from_fair,
+ .switching_to = switching_to_fair,
.switched_to = switched_to_fair,
.get_rr_interval = get_rr_interval_fair,
fit.
Before v2, I was weighing three possible places for the fix:
1. Updating `p->se.load` from `reweight_task_scx()`. This would keep the fair
weight in sync while the task is on sched_ext, so switching back to fair would
not need any extra fixup. However, it would also make sched_ext maintain fair
class state even when fair is not using it, which does not seem like the right
ownership model.
2. Rebuilding `p->se.load` from fair's `switching_to` hook. This is the most
natural place semantically, since the task is entering fair and fair prepares
its own state before enqueue. My only concern was that, for non-ext -> fair
paths, `__setscheduler_params()` may have already updated `p->se.load` through
`set_load_weight(p, true)`, so calling `set_load_weight(p, false)`
unconditionally here can be redundant logically. Functionally, though, it is
harmless.
3. Rebuilding in `sched_change_end()` based on the old/new classes. That was
the v2 choice because both classes are available there, the task has not been
enqueued yet, and it covers both `scx_root_disable()` and the partial-mode
`sched_setscheduler()` path. In hindsight, though, this makes the generic
sched_change path handle a scx & fair-specific fixup. That is more awkward
than letting fair prepare its own state in `switching_to_fair()`.
I'll respin v3 as you suggested.
Thanks,
Zicheng