Re: [PATCH v23 3/9] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration

From: K Prateek Nayak

Date: Thu Oct 30 2025 - 03:33:44 EST


Hello John,

On 10/30/2025 5:48 AM, John Stultz wrote:
> As we add functionality to proxy execution, we may migrate a
> donor task to a runqueue where it can't run due to cpu affinity.
> Thus, we must be careful to ensure we return-migrate the task
> back to a cpu in its cpumask when it becomes unblocked.
>
> Peter helpfully provided the following example with pictures:
> "Suppose we have a ww_mutex cycle:
>
> ,-+-* Mutex-1 <-.
> Task-A ---' | | ,-- Task-B
> `-> Mutex-2 *-+-'
>
> Where Task-A holds Mutex-1 and tries to acquire Mutex-2, and
> where Task-B holds Mutex-2 and tries to acquire Mutex-1.
>
> Then the blocked_on->owner chain will go in circles.
>
> Task-A -> Mutex-2
> ^ |
> | v
> Mutex-1 <- Task-B
>
> We need two things:
>
> - find_proxy_task() to stop iterating the circle;
>
> - the woken task to 'unblock' and run, such that it can
> back-off and re-try the transaction.
>
> Now, the current code [without this patch] does:
> __clear_task_blocked_on();
> wake_q_add();
>
> And surely clearing ->blocked_on is sufficient to break the
> cycle.
>
> Suppose it is Task-B that is made to back-off, then we have:
>
> Task-A -> Mutex-2 -> Task-B (no further blocked_on)
>
> and it would attempt to run Task-B. Or worse, it could directly
> pick Task-B and run it, without ever getting into
> find_proxy_task().
>
> Now, here is a problem because Task-B might not be runnable on
> the CPU it is currently on; and because !task_is_blocked() we
> don't get into the proxy paths, so nobody is going to fix this
> up.
>
> Ideally we would have dequeued Task-B alongside of clearing
> ->blocked_on, but alas, [the lock ordering prevents us from
> getting the task_rq_lock() and] spoils things."
>
> Thus we need more than just a binary concept of the task being
> blocked on a mutex or not.
>
> So allow setting blocked_on to PROXY_WAKING as a special value
> which specifies the task is no longer blocked, but needs to
> be evaluated for return migration *before* it can be run.

Now I can truly appreciate the need for the tri-state with
that updated commit log. Thank you for the detailed explanation.
Feel free to include:

Reviewed-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

--
Thanks and Regards,
Prateek

>
> This will then be used in a later patch to handle proxy
> return-migration.
>
> Signed-off-by: John Stultz <jstultz@xxxxxxxxxx>