Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()

From: John Stultz

Date: Fri May 29 2026 - 02:50:01 EST


On Thu, May 28, 2026 at 4:20 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> On Tue, May 26, 2026 at 4:39 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> > On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > Per the discussion here:
> > >
> > > https://lore.kernel.org/all/20260403112810.GG3738786@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > >
> > > The reason for this condition is that the signal condition in
> > > try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
> > > does that, in fact, that path does clear_task_blocked_on(), rendering the
> > > clause under discussion moot.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > > ---
> > > kernel/sched/core.c | 3 ---
> > > 1 file changed, 3 deletions(-)
> > >
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -7132,9 +7132,6 @@ static void __sched notrace __schedule(i
> > > if (sched_proxy_exec()) {
> > > struct task_struct *prev_donor = rq->donor;
> > >
> > > - if (!prev_state && prev->blocked_on)
> > > - clear_task_blocked_on(prev, NULL);
> > > -
> > > rq_set_donor(rq, next);
> > > next->blocked_donor = NULL;
> > > if (unlikely(next->is_blocked && next->blocked_on)) {
> >
> > Oh good! I had a note to try to re-confirm if that chunk was really
> > needed, as it did feel a bit like it was patching up a problem after
> > the fact.
> >
> > That said, running this on top of your sched/proxy branch tripped over
> > warnings with the ww_mutex selftest, so it looks like there's
> > something else missing before this can land.
> >
> > Digging in, it looks like we still need the fix I had here:
> > https://lore.kernel.org/lkml/20260430215103.2978955-3-jstultz@xxxxxxxxxx/
> >
> > Since without that, we can get into a situation where we have
> > blocked_on set when a task __state is TASK_RUNNING. The segment
> > you're dropping would catch and clear that out, but really we should
> > avoid getting into that situation in the mutex lock code.
>
> Hey Peter,
> So I've done testing with your full sched/proxy tree and with the
> entire set it looks ok.
>
> However, even with the fix I poined out, I've unfortunately hit races
> with the ww_mutex selftest at the point of this patch in the series.
> Basically between commit
> 1b89b7b21bf5 ("sched/proxy: Remove superfluous clear_task_blocked_in()")
> and
> a8be1edac5a1 ("sched/proxy: Remove PROXY_WAKING")
>
> I'm currently tracing down exactly why the race is cropping up but I
> believe the chunk removed in this case is avoiding cases where we end
> up getting PROXY_WAKING set on a TASK_RUNNING task.
>
> I'll get back to you when I get my head around it properly, but wanted
> to raise the issue in case you or K Prateek can see right through it.

So digging into this, I think part of the issue is the logic you're
removing here sort of keeps the blocked_on and __state/is_blocked
values in sync. And I think one reason they can get out of sync is due
to the task_is_current() shortcut returns in find_proxy_task(), where
we clear is_blocked/blocked_on and return the current task.

Bascially we can get in a situation where (sorry this gets a bit convoluted):

1) On CPU1, __mutex_lock_common, we set task A
blocked_on/TASK_UNINTERRUPTABLE, and call into __schedule().

2) On CPU2, task B who holds the mutex calls __mutex_unlock_slowpath()
and sets task A as PROXY_WAKING and starts to call into
wake_up_task().

3) On CPU1, in __schedule() we pick_next_task(), which returns task A,
which is_blocked. We call find_proxy_task() and note task A is
PROXY_WAKING. Since its also current, we take the short-cut and clear
is_blocked and blocked_on and return task A to run.

4) On CPU2, try_to_wake_up() hits ttwu_runnable(), and
proxy_needs_return() returns false as A->blocked_on is zero.

5) On CPU 1, task A is running, it grabs the lock it was waiting for
and exits __mutex_lock_common. It then enters __mutex_lock_common to
grab a different mutex that is already locked. It sets itself
blocked_on/TASK_INTERRUPTABLE and calls into __schedule()

6) On CPU2, ttwu_runnable() continues, and calls ttwu_do_wakeup(),
which clears A->is_blocked and sets the A->__state TASK_RUNNING

7) On CPU3, task C that holds the mutex A is waiting on, calls
__mutex_unlock_slowpath, setting A as PROXY_WAKING and calls into
wake_up_task()

8) On CPU1, in __schedule() pick_next_task() again returns task A. But
is_blocked is now zero, so we just return task A, even though
blocked_on is PROXY_WAKING.

9) On CPU1, task A gets back to the __mutex_lock_common() loop, calls
set_task_blocked_on() and trips warnings as A->blocked_on is still
PROXY_WAKING.

The chunk you drop in this patch effectively clears blocked_on in
whenever we notice current is TASK_RUNNING, which keeps us in sync to
avoid this.

I'm also wondering if dropping the shortcut returns in
find_proxy_task() that clear blocked_on and is_blocked is maybe the
right thing? Just to reduce the cases we have to wrangle here.
Even so, at least in my initial testing in doing so with just this
patch that doesn't seem sufficient as it seems ttwu() can still race
setting A->__state TASK_RUNNING vs find_proxy_task() hitting a
proxy_deactive() case tripping the TASK_RUNNING warn-on.

Maybe we just need to move the removal of this chunk till after we
drop PROXY_WAKING?

I'll tinker more tomorrow
-john