Re: [PATCH 1/2] sched: proxy-exec: Close race causing workqueue work being delayed

From: K Prateek Nayak

Date: Thu Apr 30 2026 - 03:26:13 EST

Hello John,

On 4/30/2026 11:14 AM, John Stultz wrote:
> On Wed, Apr 29, 2026 at 2:00 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>> On 4/29/2026 7:57 AM, John Stultz wrote:
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 8ec3b6d7d718b..6ea74aecc5fbd 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -6535,8 +6536,13 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
>>>> * blocked on a mutex, and we want to keep it on the runqueue
>>>> * to be selectable for proxy-execution.
>>>> */
>>>> - if (!should_block)
>>>> + if (!should_block) {
>>>> + guard(raw_spinlock)(&p->blocked_lock);
>>>> + /* Stable against race */
>>>> + if (task_is_blocked(p))
>>>> + WRITE_ONCE(p->se.sched_proxy, 1);
>>>> return false;
>>>> + }
>>>
>>> So if we double check and find the task isn't blocked anymore, we
>>> probably shouldn't return early here, no?
>>>
>>> Let me take a stab at the bit flag approach and see how it goes.
>>
>> In case you want to peek at my homework ;-)
>>
>
> Well, now it will just seem like I'm cheating! :)

But I can't prove I didn't copy your idea first ;-)

>> @@ -7043,8 +7053,16 @@ static void __sched notrace __schedule(int sched_mode)
>> if (sched_proxy_exec()) {
>> struct task_struct *prev_donor = rq->donor;
>>
>> + /*
>> + * A wakeup raced with block_task();
>> + * Clear blocked_on before running the task
>> + * again.
>> + */
>> + if (unlikely(!prev_state && prev->blocked_on))
>> + clear_task_blocked_on(prev, NULL);
>> +
>
> So similar to your previous change, I was hitting this same case, and
> adding in your change here was bothering me a bit that it felt like
> clearing things here was papering over an unexpected state change that
> we didn't have before.
>
> Part of the issue is the blocked_on state machine is already a little complex:
> NULL -> ptr -> PROXY_WAKING ->NULL
> (with some limited shortcuting from ptr->NULL when we are dealing with current)
>
> So adding this BO_FLAG_PROXY/"latch" bit to the mix complicates things:
> NULL -> ptr:unlatched -> ptr:latched -> PROXY_WAKING -> NULL
>
> With extra lines for:
> ptr:unlatched -> NULL
> ptr:latched -> NULL
> ptr:unlatched -> PROXY_WAKING
>
> I kept on seeing these unexpected transitions where before we latched
> the blocked_on state, we'd get a wakeup on another cpu from the owner
> unlocking the lock, so we'd go ptr:unlatched -> PROXY_WAKING (along
> with setting TASK_RUNNING). Then because we weren't latched, we might
> be running midway through the lock logic. If we go into schedule()
> we'll just come back out eventually (as we're TASK_RUNNING). Then when
> then try again in the lock loop to set blocked_on to ptr:unlatched
> again, we hit warnings because we are setting it to a ptr when it is
> PROXY_WAKING and not NULL.

Yes, it was precisely that which caused the splat in my case too for
the initial suggestion on Peter's diff. I'll try to add a little
bit more context in those tricky comments from next time onwards.
Sorry about that!

>
> I'm sure this was obvious to you when you wrote the snippit above, but
> I've been running a little slow, so let's not talk about how long it
> took for me to actaully understand this. :P
>
> And yes, your snippit above avoids this, but it feels a little
> incidental, cleaning up after the fact. So I think it might be better
> to simplify the state machine a little to prevent it.
>
> My current thought is to cut out the ptr:unlatched -> PROXY_WAKING
> transition. And instead in __set_task_blocked_on_waking(), instead of
> checking blocked_on and doing an early return if it is null, we could
> instead check if the latch bit was set, and if so, set blocked_on to
> NULL. This bascially treats ptr:unlatched the same as NULL in most
> cases for the state machine.

That is a fantastic idea! Since the sched bits have not blocked it
yet, there is no need to go through the return migration path.

Clearing it when not latched should do the trick.

>
> Now, earlier you made some nice explanations earlier about how some of
> the blocked_on checks done w/ the rq_lock and not the blocked_lock
> were safe because of the limited situations where we clear blocked_on.
> And this at first seems to put that assumption risk, but I think it
> still holds if we always consider ptr:unlatched equivalent to NULL. So
> once ptr:latched is set, the rule still holds.

Yes, that is indeed true. Transitioning the task to latched state, and
unlatching it, will both be done with rq_lock + blocked_on_lock so it
should be okay to peek at the latched state when holding the task's rq
lock in ttwu_runnable(), and in schedule().

>
> That gives us:
> NULL -> ptr:unlatched -> ptr:latched -> PROXY_WAKING -> NULL
>
> Where NULL and ptr:unlatched are functionally equivalent except for
> the ability to transition to ptr:latched.
>
> With:
> ptr:latched -> NULL // only done on current
> ptr:unlatched -> NULL // only done on current or when trying to set waking
>
> So far my testing is doing ok with this. Let me know if you see any
> holes though.

This looks good to me! Cannot spot any holes with that.

However, to completely get rid of that hunk in __schedule() we
probably need to order setting the blocked_on wrt task's __state
based on Peter's analysis in
https://lore.kernel.org/lkml/20260403125424.GA2872@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

As Peter suggested, I think we need:

__set_task_blocked_on(current, lock);
/*
* Pairs with smp_rmb() after ttwu_state_match() in try_to_wake_up().
* Ensures ttwu always see the correct blocked_on state for wakeups.
*/
smp_wmb();
set_current_state(state);

in __mutex_lock_common() for it to be completely safe with the
proxy_needs_return() bits and that should be good enough to avoid that
race but I'll let Peter comment if he thinks there is still a race
window open.

>
> I'll try to cleanup my current changes (surely copying some of your
> excellent work :), and send this out tomorrow (I'm shot for today) for
> more concrete review.

I'll be looking forward to it :-)

--
Thanks and Regards,
Prateek