Re: [RFC PATCH v15 6/7] sched: Fix proxy/current (push,pull)ability
From: John Stultz
Date: Sat Mar 15 2025 - 01:10:25 EST
On Fri, Mar 14, 2025 at 1:40 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
> On 3/13/2025 3:41 AM, John Stultz wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b4f7b14f62a24..3596244f613f8 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6722,6 +6722,23 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
> > }
> > #endif /* SCHED_PROXY_EXEC */
> >
> > +static inline void proxy_tag_curr(struct rq *rq, struct task_struct *owner)
> > +{
> > + if (!sched_proxy_exec())
> > + return;
> > + /*
> > + * pick_next_task() calls set_next_task() on the chosen task
> > + * at some point, which ensures it is not push/pullable.
> > + * However, the chosen/donor task *and* the mutex owner form an
> > + * atomic pair wrt push/pull.
> > + *
> > + * Make sure owner we run is not pushable. Unfortunately we can
> > + * only deal with that by means of a dequeue/enqueue cycle. :-/
> > + */
> > + dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
> > + enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
> > +}
> > +
> > /*
> > * __schedule() is the main scheduler function.
> > *
> > @@ -6856,6 +6873,10 @@ static void __sched notrace __schedule(int sched_mode)
> > * changes to task_struct made by pick_next_task().
> > */
> > RCU_INIT_POINTER(rq->curr, next);
> > +
> > + if (!task_current_donor(rq, next))
> > + proxy_tag_curr(rq, next);
>
> I don't see any dependency on rq->curr for task_current_donor() check.
> Could this check be moved outside of the if-else block to avoid
> duplicating in both places since rq_set_donor() was called just after
> pick_next_task() or am I missing something?
So this check is just looking to see if next is the same as the
selected rq->donor (what pick_next_task() chose).
If so, nothing to do, same as always.
But If not (so we are proxying in this case), we need to call
proxy_tag_curr() because we have to make sure both the donor and the
proxy are not on a sched-classes pushable list.
This is because the logic around pick_next_task() calls
set_next_task() on the returned donor task, and in the sched-class
code, (for example RT) that logic will remove the chosen donor task
from the pushable list.
But when we find a proxy task to run on behalf of the donor, the
problem is that the proxy might be on the sched-class' pushable list.
So if we are proxying, we do a dequeue and enqueue pair, which allows
us to re-evaluate if the task is rq->curr, which will prevent it from
being added to any such pushable list. This avoids the potential of
the balance callbacks trying to migrate the rq->curr under us.
Thanks so much for the review and the question! Let me know if that
makes any more sense, or if you have suggestions on how I could better
explain it in the commit message to help.
Appreciate it!
-john