Re: [RFC PATCH sched_ext/for-7.2 0/10] sched: Make proxy execution compatible with sched_ext
From: Tejun Heo
Date: Fri May 08 2026 - 21:01:05 EST
Hello,
I'm a bit worried this is more invasive than what it buys. Even with
the full series, the cross-CPU gap Prateek raised stays open -
find_proxy_task() doesn't go through put_prev_set_next_task(), so owner
runs without ops.running(owner). Closing that seems to need yet another
protocol on top, either synthetic running/stopping events or scx core
taking over dispatch_dequeue for substitutions. The BPF scheduler ends
up dispatching tasks it didn't pick and observing callbacks for tasks
it didn't enqueue, which feels too magical and error-prone.
Maybe worth considering an alternative where, when scx is loaded, we
just turn proxy-exec off entirely and expose blocked_on to the BPF
scheduler. Schedulers that want PI can implement it themselves on top
of the relationship; ones that don't pay nothing.
scx_enable could flip the proxy_exec static branch off, after which the
existing gates in __schedule keep blocked tasks off the runqueue and
skip find_proxy_task on their own. The remaining concern is in-flight
donors at the moment of the flip - the existing scx_bypass walk already
visits every rq's runnable list during enable, and could force-block
any task it sees with blocked_on set. Mutex unlock would re-wake them
through wake_q normally after that. blocked_on itself is set and
cleared in mutex.c regardless of proxy_exec, so the signal we'd want
to surface is already there.
For the BPF side, the natural shape seems to be tagging the existing
ops.quiescent and ops.runnable callbacks with a bit indicating "this
sleep/wake was a mutex transition," plus a small kfunc that returns
the owner of the mutex p is blocked on. A scheduler that wants PI then
records the owner in its own task storage on the quiescent side, boosts
it via the existing vtime / slice / dsq_move / kick primitives, and
drops the boost when the runnable side fires. No new dispatch protocol,
the BPF scheduler stays in charge of who runs.
Does that direction seem reasonable, or am I missing something that
makes it not work?
Thanks.
--
tejun