Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration)

From: John Stultz

Date: Wed Nov 19 2025 - 20:54:13 EST


On Sun, Nov 9, 2025 at 8:48 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Hello John,
>
> On 11/8/2025 4:48 AM, John Stultz wrote:
> >>> @@ -6689,26 +6834,41 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
> >>> return NULL;
> >>> }
> >>>
> >>> + if (task_current(rq, p))
> >>> + curr_in_chain = true;
> >>> +
> >>> owner = __mutex_owner(mutex);
> >>> if (!owner) {
> >>> /*
> >>> - * If there is no owner, clear blocked_on
> >>> - * and return p so it can run and try to
> >>> - * acquire the lock
> >>> + * If there is no owner, either clear blocked_on
> >>> + * and return p (if it is current and safe to
> >>> + * just run on this rq), or return-migrate the task.
> >>> */
> >>> - __clear_task_blocked_on(p, mutex);
> >>> - return p;
> >>> + if (task_current(rq, p)) {
> >>> + __clear_task_blocked_on(p, NULL);
> >>> + return p;
> >>> + }
> >>> + action = NEEDS_RETURN;
> >>> + break;
> >>> }
> >>>
> >>> if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
> >>
> >> Should we handle task_on_rq_migrating() in the similar way?
> >> Wait for the owner to finish migrating and look at the
> >> task_cpu(owner) once it is reliable?
> >
> > Hrm. I'm not quite sure I understand your suggestion here. Could you
> > expand a bit here? Are you thinking we should deactivate the donor
> > when the owner is migrating? What would then return the donor to the
> > runqueue? Just rescheduling idle so that we drop the rq lock
> > momentarily should be sufficient to make sure the owner can finish
> > migration.
>
> In find_proxy_task() we have:
>
> if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
> /* Returns rq->idle or NULL */
> }
>
> /*
> * Owner can be task_on_rq_migrating() at this point
> * since it is in turn blocked on a lock owner on a
> * different CPU.
> */
>
> owner_cpu = task_cpu(owner); /* Prev CPU */
> if (owner_cpu != this_cpu) {
> ...
> action = MIGRATE;
> break;
> }
>
>
> So in the end we can migrate to the previous CPU of the owner
> and the previous CPU has to do a chain migration again. I'm
> probably overthinking about a very unlikely scenario here :)

Ok, so you're suggesting maybe putting the
if (task_on_rq_migrating(owner))
case ahead of the
if (owner_cpu != this_cpu)
check?

Let me give that a whirl and see how it does.

thanks
-john