Re: [PATCH v25 1/9] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr()

From: K Prateek Nayak

Date: Sun Mar 15 2026 - 12:27:22 EST

Hello John,

On 3/13/2026 8:00 AM, John Stultz wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7f77c165a6e0..d86d648a75a4b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6702,23 +6702,6 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
> }
> #endif /* SCHED_PROXY_EXEC */
>
> -static inline void proxy_tag_curr(struct rq *rq, struct task_struct *owner)
> -{
> - if (!sched_proxy_exec())
> - return;
> - /*
> - * pick_next_task() calls set_next_task() on the chosen task
> - * at some point, which ensures it is not push/pullable.
> - * However, the chosen/donor task *and* the mutex owner form an
> - * atomic pair wrt push/pull.
> - *
> - * Make sure owner we run is not pushable. Unfortunately we can
> - * only deal with that by means of a dequeue/enqueue cycle. :-/
> - */
> - dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
> - enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
> -}
> -
> /*
> * __schedule() is the main scheduler function.
> *
> @@ -6871,9 +6854,6 @@ static void __sched notrace __schedule(int sched_mode)
> */
> RCU_INIT_POINTER(rq->curr, next);
>
> - if (!task_current_donor(rq, next))
> - proxy_tag_curr(rq, next);
> -

Back to my concern with the queuing of the balance_callback, and the
deadline and RT folks can keep me honest here, consider the following:

CPU0
====

======> Task A (prio: 80)
...

mutex_lock(Mutex0)
... /* Executing critical section. */

=====> Interrupt: Wakes up Task B (prio: 50); B->blocked_on = Mutex0;
resched_curr()
<===== Interrupt return
preempt_schedule_irq()
schedule()
put_prev_set_next_Task(A, B)
rq->donor = B
if (task_is_blocked(B)
next = find_proxy_task() /* Return Task A */
rq->curr = A
queue_balance_callback()
do_balance_callbacks()
/* Finds A as task_on_cpu(); Does nothing. */

... /* returns from schedule */
... /* continues with critical section */

mutex_unlock(Mutex0)
mutex_handoff(B /* Task B */)
preempt_disable()
try_to_wake_up()
resched_curr()
preempt_enable()
preempt_schedule()
proxy_force_return()
/* Returns to same CPU */

/*
* put_prev_set_next_task() is skipped since
* rq->donor context is same. no balance
* callbacks are queued. Task A still on the
* push list.
*/
rq->donor = B
rq->curr = B

=======> sched_out: Task A

!!! No balance callback; Task A still on push list. !!!

<======= sched_in: Task B

So what I'm getting to is, if we find that rq->donor has not changed
with sched_proxy_exec() but rq->curr has changed during schedule(), we
should forcefully do a:

prev->sched_class->put_prev_task(rq, rq->donor, rq->donor /* or rq->idle / NULL ? */);
next->sched_class->set_next_task(rq, rq->donor, true /* to queue balance callback. */);

That way, when we do set_nex_task(), we see if we potentially have
tasks in the push list and queue a balance callback since the
task_on_cpu() condition may no longer apply to the tasks left behind
on the list.

Thoughts?

> /*
> * The membarrier system call requires each architecture
> * to have a full memory barrier after updating
> @@ -6907,10 +6887,6 @@ static void __sched notrace __schedule(int sched_mode)
> /* Also unlocks the rq: */
> rq = context_switch(rq, prev, next, &rf);
> } else {
> - /* In case next was already curr but just got blocked_donor */
> - if (!task_current_donor(rq, next))
> - proxy_tag_curr(rq, next);
> -
> rq_unpin_lock(rq, &rf);
> __balance_callbacks(rq, NULL);
> raw_spin_rq_unlock_irq(rq);

--
Thanks and Regards,
Prateek