Re: [PATCH v29 7/9] sched: Add blocked_donor link to task for smarter mutex handoffs

From: Peter Zijlstra

Date: Fri May 22 2026 - 05:50:43 EST

On Tue, May 19, 2026 at 09:16:58PM +0530, K Prateek Nayak wrote:

> >> @@ -1001,6 +1001,12 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
> >> MUTEX_WARN_ON(__owner_task(owner) != current);
> >> MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP);
> >>
> >> + if (sched_proxy_exec() && current->blocked_donor) {
> >> + /* force handoff if we have a blocked_donor */
> >> + owner = MUTEX_FLAG_HANDOFF;
> >> + break;
> >> + }
> >> +
> >> if (owner & MUTEX_FLAG_HANDOFF)
> >> break;
> >>
> >
> > AFAICT this is racy since we don't have preemption disabled.
> >
> > So we can observe ->blocked_donor (A) set, or (B) unset.
> > If (A) we can schedule() right after this (and before taking
> > ->wait_lock) and it can be unset when we resume running this task. Or
> > (B), the exact opposite.
> >
> > Now, (A) is harmless, because if ->blocked_donor becomes NULL, the
> > hand off code falls back to picking the first on the wait list and
> > things just get on.
> >
> > *However*, (B) might be a problem, because then we will not have the
> > HANDOFF bit set even though there is in fact a donor we need to hand off
> > to.
>
> I don't think it is that big of a problem: __mutex_unlock_slowpath()
> sees current->blocked_donor as NULL but just before it can do:
>
> atomic_long_try_cmpxchg_release(&lock->owner, &owner, __owner_flags(owner))
>
> it is preempted and comes back with a ->blocked_donor.
>
> Once we grab the wait_lock, we realize there is a ->blocked_donor link
> and we wake it up instead of first waiter which goes and grabs the
> lock once it hits mutex_trylock() for !owner case.
>
> If a concurrent mutex_lock() races with this sequence, the
> ->blocked_donor that was woken up sees the new owner and goes to proxy
> that.
>
> With guard(preempt)(), we see a stable ->blocked_donor as NULL, and we
> wake up the first waiter after clearing the "owner" from lock->owner.
>
> If a concurrent mutex_lock() races, it still grabs the mutex and now
> it is just the first waiter that realizes it has to go proxy the task
> that stole the lock.
>
> One wasted wakeup is incurred in both cases but it seems like no
> biggie. Maybe I don't understand the subtleties but stabilizing it
> against preemption is definitely a good move since it saves this
> debate :-)

The preemption can happen after the cmpxchg loop completes and before we
take the wake_lock.

> > @@ -1063,7 +1071,8 @@ static noinline void __sched __mutex_unl
> > __mutex_handoff(lock, next);
> >
> > raw_spin_unlock(&current->blocked_lock);
> > - raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
> > + raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
> > + wake_up_q(&wake_q);
>
> In that case, can't we simplify this to:
>
> if (next)
> wake_up_process(next);
>
> and save on the wake_q enqueue dequeue overheads?

Sorta, it still needs the get/put_task_struct() dance on.