Re: [PATCH v2 1/2] sched: proxy-exec: Close race causing workqueue work being delayed

From: John Stultz

Date: Mon May 04 2026 - 17:33:47 EST


On Sun, May 3, 2026 at 11:43 AM 'K Prateek Nayak' via kernel-team
<kernel-team@xxxxxxxxxxx> wrote:
> On 5/2/2026 3:56 AM, John Stultz wrote:
> > On Fri, May 1, 2026 at 11:59 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >> On Fri, May 01, 2026 at 09:25:29PM +0530, K Prateek Nayak wrote:
> >>
> >>>> @@ -3685,6 +3691,7 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
> >>>> */
> >>>> static inline void ttwu_do_wakeup(struct task_struct *p)
> >>>> {
> >>>> + p->is_blocked = 0;
> >>>
> >>> I don't think it is this simple at the moment because the proxy bits in
> >>> __schedule() still have to handle PROXY_WAKING and once we clear it here
> >>> task will no longer go through proxy_needs_return() path.
> >>>
> >>> Clearing of ->is_blocked has to be done at the same point where
> >>> ->blocked_on is cleared although they are set separately.
> >>
> >> Argh. Its all a convoluted mess. AFAICT this all goes away when we make
> >> ttwu() do the return migration properly. And then it does work.
> >>
> >> So we're now in the situation that things are a bit of a mess, and we
> >> need to make a bigger mess, only to then instantly remove it all again
> >> when we clean up :/
> >
> > Apologies! I don't want to make you grumpy coming back from being ill
> > (hope you're feeling better!).
> >
> >> Can't we simply mark PROXY_EXEc broken for a cycle? Its not like the
> >> upstream version has been very functional anyway.
> >
> > This issue has been present for awhile (since it is really around the
> > proxy deactivation path taking action in the preempt case). I just
> > reproduced it with the early chunk of PROXY_EXEC logic that was in
> > v6.18. So I don't think it's super urgent as the proxy-exec code
> > upstream isn't complete (and behind CONFIG_EXPERIMENTAL).
> >
> > So let me take a swing at integrating your approach into the next
> > chunk of patches, and hopefully they can be ready for the next merge
> > window.
>
> So when looking at all of this, I realized we probably don't need
> PROXY_WAKING anymore if we have the "is_blocked" state in task_struct.
> The owner can simply clear the blocked_on and move along and the
> waiter's "is_blocked" state will handle the sched bits.
>
> (p->is_blocked && !p->blocked_on) can then be interpreted as
> PROXY_WAKING and that task should explore return migration in
> find_proxy_task().

Interesting! Using the follow-on patch you sent here, it doesn't seem
to trip up the issues with the reproducer Vineeth implemented and I've
not hit any troubles from initial testing against 7.1-rc1.

It may take me a little bit to really get my head around the change to
layer the rest of the series ontop w/o PROXY_WAKING. But it is aligned
with Peter's suggestion and gets rid extra state, and if it can apply
first that's nicer then having the get the ttwu handling in place
before switching to the is_blocked logic (which is what I was testing
if we were going to just fix the issue in the next chunk), so it looks
attractive!

I'll take a swing at reworking my code ontop of your patch here and
let you know how it goes.

thanks
-john