Re: [PATCH v29 7/9] sched: Add blocked_donor link to task for smarter mutex handoffs
From: John Stultz
Date: Wed May 20 2026 - 17:15:51 EST
On Tue, May 19, 2026 at 7:32 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, May 12, 2026 at 02:56:17AM +0000, John Stultz wrote:
> > @@ -1001,6 +1001,12 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
> > MUTEX_WARN_ON(__owner_task(owner) != current);
> > MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP);
> >
> > + if (sched_proxy_exec() && current->blocked_donor) {
> > + /* force handoff if we have a blocked_donor */
> > + owner = MUTEX_FLAG_HANDOFF;
> > + break;
> > + }
> > +
> > if (owner & MUTEX_FLAG_HANDOFF)
> > break;
> >
>
> AFAICT this is racy since we don't have preemption disabled.
>
> So we can observe ->blocked_donor (A) set, or (B) unset.
> If (A) we can schedule() right after this (and before taking
> ->wait_lock) and it can be unset when we resume running this task. Or
> (B), the exact opposite.
>
> Now, (A) is harmless, because if ->blocked_donor becomes NULL, the
> hand off code falls back to picking the first on the wait list and
> things just get on.
>
> *However*, (B) might be a problem, because then we will not have the
> HANDOFF bit set even though there is in fact a donor we need to hand off
> to.
>
...
>
> That is, we need this on top, no?
>
> ---
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1242,8 +1242,14 @@ struct task_struct {
> #endif
>
> struct mutex *blocked_on; /* lock we're blocked on */
> - struct task_struct *blocked_donor; /* task that is boosting this task */
> raw_spinlock_t blocked_lock;
> +
> + /*
> + * The task that is boosting this task; a back link for the current
> + * donor stack. Set in schedule() -> find_proxy_task() and only stable
> + * under preempt_disable().
> + */
> + struct task_struct *blocked_donor;
>
> #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> /*
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -990,6 +990,14 @@ static noinline void __sched __mutex_unl
> __release(lock);
>
> /*
> + * Ensures the proxy donor stack is stable across unlock and handoff.
> + * Specifically, it avoids the case where current->blocked_donor is
> + * NULL when it is inspected while doing the unlock, but a preemption
> + * before taking the wake_lock would make it set and a hand-off is
> + * missed.
> + */
> + guard(preempt)();
> + /*
> * Release the lock before (potentially) taking the spinlock such that
> * other contenders can get on with things ASAP.
> *
> @@ -1063,7 +1071,8 @@ static noinline void __sched __mutex_unl
> __mutex_handoff(lock, next);
>
> raw_spin_unlock(¤t->blocked_lock);
> - raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
> + raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
> + wake_up_q(&wake_q);
> }
>
> #ifndef CONFIG_DEBUG_LOCK_ALLOC
Thanks for the review and bugfix here, Peter! I've integrated this
into my tree, and will work out a similar fix for the rwsem logic in
my full series.
I know you have a tree with some of these patches that your tinkering
with, so I'll wait to see how that settles down before rebasing and
sending another iteration of my set.
thanks
-john