Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution

From: Peter Zijlstra

Date: Fri Apr 03 2026 - 08:55:05 EST

On Fri, Apr 03, 2026 at 03:55:22PM +0530, K Prateek Nayak wrote:

> >> if (sched_proxy_exec() && p->blocked_on) {
> >
> > So I had doubts about this lockless test of ->blocked_on, I still cannot
> > convince myself it is correct.
>
> Let me give a try: A task's "blocked_on" starts off as a valid mutex and
> can be transitioned optionally to PROXY_WAKING (!= NULL) before being
> cleared.
>
> If blocked_on is cleared directly, PROXY_WAKING transition never
> happens even if someone does set_task_blocked_on_waking() since we bail
> out early if !p->blocked_on.
>
> All "p->blocked_on" transition happen with "blocked_on_lock" held.
>
> So that begs the question, when is "blocked_on" actually cleared?
>
> 1) If the task is task_on_rq_queued(), we either clear it in schedule()
> (find_proxy_task() to be precise) or in ttwu_runnable() - both with
> rq_lock held.
>
> 2) *NEW* If the task is off rq and is waking up, it means there is a
> ttwu_state_match() and without proxy, the task would have woken up
> and executed on the CPU.
>
> Since the task is completely off rq, schedule() cannot clear the
> p->blocked_on. Only other remote transition possible is to
> PROXY_WAKING (!= NULL).
>
> So *inspecting* the p->blocked_on relation without the
> blocked_on_lock held should be fine to know if the task has a
> blocked_on relation.
>
> Only the task itself can set "p->blocked_on" to a valid mutex when
> running on the CPU so it is out of question we can suddenly get a
> transition to a new mutex when we are in schedule() or in middle of
> waking the task.

So my consideration was:

__mutex_lock_common()
...
raw_spin_lock(&current->blocked_lock);
__set_task_blocked_on(current, lock)
current->blocked_on = lock;
set_current_state(state)
current->__state = state;
smp_mb();

This means we have:

LOCK
[W] ->blocked_on = lock
[W] ->__state = state;
MB

Then consider:

try_to_wake_up()
...
raw_spin_lock_irqsave(&p->lock);
if (ttwu_state_match(p, state, &success))
...
smp_rmb();
if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))
if (sched_proxy_exec() && p->blocked_on)

This is effectively:

ACQUIRE
[R] ->__state
RMB
[R] ->blocked_on

Combined this gives:

CPU0 CPU1

LOCK ACQUIRE
[W] ->blocked_on = lock [R] ->__state
[W] ->__state = state; RMB
MB [R] ->blocked_on

And that is *NOT* properly ordered. It is possible to observe [W]
__state and pass ttwu_state_match() and NOT observe [W] ->blocked_on and
see !->blocked_on.

(on weakly ordered machines, obviously)

So that does a ttwu() but will 'retain' ->blocked_on -- which violates
the model. Which is about where I got.

That said; this race, while valid, doesn't actually harm. Because as you
say, this means that CPU1 is in the middle of mutex_lock() and will
observe the wakeup and cancel the block and clean up ->blocked_on
itself.

So yeah, I think we're good.