Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration)

From: K Prateek Nayak
Date: Sun Nov 09 2025 - 23:48:15 EST


Hello John,

On 11/8/2025 4:48 AM, John Stultz wrote:
>>> @@ -6689,26 +6834,41 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>>> return NULL;
>>> }
>>>
>>> + if (task_current(rq, p))
>>> + curr_in_chain = true;
>>> +
>>> owner = __mutex_owner(mutex);
>>> if (!owner) {
>>> /*
>>> - * If there is no owner, clear blocked_on
>>> - * and return p so it can run and try to
>>> - * acquire the lock
>>> + * If there is no owner, either clear blocked_on
>>> + * and return p (if it is current and safe to
>>> + * just run on this rq), or return-migrate the task.
>>> */
>>> - __clear_task_blocked_on(p, mutex);
>>> - return p;
>>> + if (task_current(rq, p)) {
>>> + __clear_task_blocked_on(p, NULL);
>>> + return p;
>>> + }
>>> + action = NEEDS_RETURN;
>>> + break;
>>> }
>>>
>>> if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
>>
>> Should we handle task_on_rq_migrating() in the similar way?
>> Wait for the owner to finish migrating and look at the
>> task_cpu(owner) once it is reliable?
>
> Hrm. I'm not quite sure I understand your suggestion here. Could you
> expand a bit here? Are you thinking we should deactivate the donor
> when the owner is migrating? What would then return the donor to the
> runqueue? Just rescheduling idle so that we drop the rq lock
> momentarily should be sufficient to make sure the owner can finish
> migration.

In find_proxy_task() we have:

if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
/* Returns rq->idle or NULL */
}

/*
* Owner can be task_on_rq_migrating() at this point
* since it is in turn blocked on a lock owner on a
* different CPU.
*/

owner_cpu = task_cpu(owner); /* Prev CPU */
if (owner_cpu != this_cpu) {
...
action = MIGRATE;
break;
}


So in the end we can migrate to the previous CPU of the owner
and the previous CPU has to do a chain migration again. I'm
probably overthinking about a very unlikely scenario here :)

Unfortunately, I don't really have a great way to detect it
unless we have another member in the task_struct that follows
task_cpu() for most part and is set to the "owner_cpu" as
soon as we know we are going for the "MIGRATE" action when we
are still under the "wait_lock"/"blocked_on_lock".

--
Thanks and Regards,
Prateek