Re: [PATCH v25 1/9] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr()

From: John Stultz

Date: Tue Mar 17 2026 - 00:50:01 EST


On Sun, Mar 15, 2026 at 9:27 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Back to my concern with the queuing of the balance_callback, and the
> deadline and RT folks can keep me honest here, consider the following:
>
> CPU0
> ====
>
> ======> Task A (prio: 80)
> ...
>
> mutex_lock(Mutex0)
> ... /* Executing critical section. */
>
> =====> Interrupt: Wakes up Task B (prio: 50); B->blocked_on = Mutex0;
> resched_curr()
> <===== Interrupt return
> preempt_schedule_irq()
> schedule()
> put_prev_set_next_Task(A, B)
> rq->donor = B
> if (task_is_blocked(B)
> next = find_proxy_task() /* Return Task A */
> rq->curr = A
> queue_balance_callback()
> do_balance_callbacks()
> /* Finds A as task_on_cpu(); Does nothing. */
>
> ... /* returns from schedule */
> ... /* continues with critical section */
>
> mutex_unlock(Mutex0)
> mutex_handoff(B /* Task B */)
> preempt_disable()
> try_to_wake_up()
> resched_curr()
> preempt_enable()
> preempt_schedule()
> proxy_force_return()
> /* Returns to same CPU */
>
> /*
> * put_prev_set_next_task() is skipped since
> * rq->donor context is same. no balance
> * callbacks are queued. Task A still on the
> * push list.
> */
> rq->donor = B
> rq->curr = B
> =======> sched_out: Task A
>
> !!! No balance callback; Task A still on push list. !!!
>
> <======= sched_in: Task B

Hrm. I'm feeling like I'm a little lost here, specifically after
proxy_force_return(), since it doesn't exist yet at this point in the
patch series. But assuming we're at the "Handle blocked-waiter
migration" point in the series, I'd think it would be something like:

rq->donor= B
rq->curr = A
<< task A >>
mutex_unlock(Mutex0)
mutex_handoff(B /* Task B */)
preempt_disable()
try_to_wake_up()
resched_curr()
preempt_enable()
preempt_schedule()
__schedule()
find_proxy_task()
proxy_force_return()
return NULL
pick_again:
next = pick_next_task()
__pick_next_task() /* Returns B */
rq->donor =B
rq->curr = B
context_switch()
<<switch to B >>
finish_task_switch()
finish_lock_switch()
__balance_callbacks()

Your point "put_prev_set_next_task() is skipped since rq->donor
context is same" wasn't initially obvious to me, as the fair scheduler
does have a (p == prev) check, but it doesn't enqueue balance
callbacks. And for RT/DL/SCX we should be using the pick_task()
method, which calls put_prev_set_next_task() in __pick_next_task().
But indeed, *inside* of put_prev_set_next_task() we return early if
(next == prev).

So I see your concern and agree.

> So what I'm getting to is, if we find that rq->donor has not changed
> with sched_proxy_exec() but rq->curr has changed during schedule(), we
> should forcefully do a:
>
> prev->sched_class->put_prev_task(rq, rq->donor, rq->donor /* or rq->idle / NULL ? */);
> next->sched_class->set_next_task(rq, rq->donor, true /* to queue balance callback. */);
>
> That way, when we do set_nex_task(), we see if we potentially have
> tasks in the push list and queue a balance callback since the
> task_on_cpu() condition may no longer apply to the tasks left behind
> on the list.
>
> Thoughts?

Yeah. I wonder if we can express this inside of
put_prev_set_next_task(). Reworking the shortcut to maybe:
if (next == prev && next != rq->curr)

I probably need to think on this tomorrow, as I suspect the above has
some holes, but it seems like it would catch the cases that would
matter (maybe the issue is it catches too much - we'd probably also
trip it if we A boosted B, and then we hit schedule and again chose A
to boost B, which we probably could have skipped).

I guess adding a new helper function to manually do the
put_prev/set_next could be added to the top level __schedule() logic
in the (prev != next) case, though we'll have to preserve the
prev_donor on the stack probably.

thanks
-john