Re: [PATCH v24 08/11] sched: Avoid donor->sched_class->yield_task() null traversal

From: K Prateek Nayak
Date: Thu Jan 01 2026 - 02:05:15 EST


Hello John,

On 12/30/2025 3:22 PM, K Prateek Nayak wrote:
> So I think I found my answer (at least one of them):
>
> find_proxy_task()
> /* case DEACTIVATE_DONOR */
> proxy_deactivate(rq, donor)
> proxy_resched_idle(rq); /* Switched donor to rq->idle. */
> try_to_block_task(rq, donor, &state, true)
> if (signal_pending_state(task_state, donor))
> WRITE_ONCE(p->__state, TASK_RUNNING)
> return false; /* Blocking fails. */
>
> /* If deactivate fails, force return */
> p = donor;

So I was looking at my tree with some modifications on top and that
above scenario cannot happen since "force return" just falls through to
to NEEDS_RETURN and that'll migrate the task away and return NULL
forcing a re-pick and donor will be back to normal.

> return p
>
> next = p; /* Donor is rq->idle. */
>
>
> This should be illegal and I think we should either force a "pick_again"
> if proxy_deactivate() fails (can it get stuck on an infinite loop?) or
> we should fix the donor relation before running "p".
>
> We can also push the proxy_resched_idle() into try_to_block_task() and
> only do it once we are past all the early returns and if
> task_current_donor(rq, p).
>
> ... and as I write this I realize we can have this via
> proxy_needs_return() so I guess we need this patch after all and
> proxy_needs_return() should do resched_curr() for that case too so we
> can re-evaluate the donor context on the CPU where we are stealing away
> the donor from.

But this can definitely happen. FWIW I think we can just do:

static inline void proxy_reset_donor(struct rq *rq)
{
put_prev_set_next_task(rq, rq->donor, rq->curr);
rq_set_donor(rq, rq->curr);
resched_curr(rq);
}

... instead of proxy_resched_idle() and we just account the balance time
between stealing the donor and the resched to the "rq->curr" instead of
accounting it to the idle task to stay fair.

--
Thanks and Regards,
Prateek