Re: [PATCH v24 09/11] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case

From: John Stultz

Date: Wed Mar 11 2026 - 02:45:50 EST

On Thu, Jan 1, 2026 at 1:53 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
> On 11/25/2025 4:01 AM, John Stultz wrote:
> > @@ -4197,8 +4260,15 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > */
> > scoped_guard (raw_spinlock_irqsave, &p->pi_lock) {
> > smp_mb__after_spinlock();
> > - if (!ttwu_state_match(p, state, &success))
> > - break;
> > + if (!ttwu_state_match(p, state, &success)) {
> > + /*
> > + * If we're already TASK_RUNNING, and PROXY_WAKING
> > + * continue on to ttwu_runnable check to force
> > + * proxy_needs_return evaluation
> > + */
> > + if (!proxy_task_runnable_but_waking(p))
> > + break;
> > + }
>
> I don't like the fact that we have to go against the ttwu_state_match()
> machinery here or the fact that proxy_deactivate() has to check for
> "TASK_RUNNING" as a short-cut to detect wakeups. It makes all these bits
> a bit more painful to maintain IMO.
>
> Here is what I've learnt so far:
>
> o Task can go into an interruptible sleep where a signal can cause the
> task to give up its attempt to acquire the lock ("err" path in
> __mutex_lock_common()) - This means we'll have to re-evaluate the
> "p->blocked_on" by running it.
>
> o If we weren't running with the PROXY_EXEC, a task that was blocked
> (including delayed) on a mutex will definitely receive a wakeup. With
> proxy, the only difference is that that wakeup path will now see
> "task_on_rq_queued()" and will have to go though ttwu_runnable() under
> rq_lock.
>
> o Everything else is handled in __schedule() under the rq_lock again
> and all paths that drop the rq_lock will go through a re-pick.
>
> o Once we deactivate a donor (via proxy_deactivate()), the "blocked_on"
> relation has no real use since we'll always be PROXY_WAKING when we
> come back (or we have received a signal and that blocked_on relation
> now needs a re-evaluation anyways)
>
> I'm sure a bunch of these hold up for RWSEM too.
>
> With that in mind, we can actually generalize ttwu_runnable() and
> proxy_needs_return() to clear "p->blocked_on", even for !PROXY_WAKING
> since a wakeup implies we have to re-evaluate the "p->blocked_on".
>
> We already have clear_task_blocked_on() within the rq_lock critical
> section (except for when task is running) and we ensure blocked donors
> cannot go TASK_RUNNING with task_is_blocked() != NULL.
>
>
> This is what I had in mind based on top of commit d424d28ea93
> ("sched: Migrate whole chain in proxy_migrate_task()") on
> "proxy-exec-v24-6.18-rc6" branch:
>
> (lightly tested with sched-messaging; Haven't seen any splats
> or hung-task yet but from my experience, it doesn't hold any
> water when my suggestion meets the real-world :-)
...
>
> I guess ignorance is bliss and all this might be a terrible idea but
> I like how proxy_wake_up_donor() (aka proxy_needs_return()) ends up
> being much simpler now and the whole:
>
> blocked_donor (!TASK_RUNNING && task_is_blocked())
> |
> v
> wakeup (clear_task_blocked_on() + ttwu())
> |
> v
> runnable (TASK_RUNNING && !task_is_blocked())
>
> is easier to reason about IMO and should also help sched-ext since we
> now have defined points to notify where the proxy starts and ends for
> them to hook onto.
>
> I think I lost the "wake_cpu" check in proxy_needs_return() somewhere
> down the line but doing a block + wakeup irrespective seems to be the
> right thing since in absence of proxy, !task_current() would have
> blocked and received a wakeup anyways.
>
> Sorry for dumping a bunch of radical suggestions during the holidays.
> Hope we can flush out a bunch more of proxy execution (if not all of
> it) this year :-)

Apologies, I've been dragging my feet a bit on this email, as its a
fair amount of change to get my head around. :) As always, I
*really* appreciate your thoughtful feedback.

Running with your patch, I do get quite a few splats from the
assert_balance_callbacks_empty() check (which then later causes a
deadlock from lockdep on the console lock), but that seems to be due
to proxy_needs_return()'s calling proxy_reset_donor() which does a
put_prev_set_next() which might enqueue a balance callback that we
then don't run when we drop the rq lock in ttwu_runnable(). Zapping
the callback seems to resolve this.

>From there, I've been trying to separate and understand the different
threads of change and their dependencies so that I can apply them to
the right point in the series, but its subtle and the changes do seem
very inter-dependent.

The main semantic rework of task state and blocked_on is interesting.
I definitely like the idea of simplifying the ttwu paths. But the
change to also feels a little messy to me, so I'm not 100% on adopting
it yet, but maybe I need to just get more familiar with it or combine
the logic in some form.

The suggestion on donor wakeup to just set rq->donor to rq->curr
sounds really nice. I've been thinking of doing similar but was
hesitant as the donor change has already been a source of races and
I've been wary to rock the boat, but I'd like to pick this up. However
the put_prev_set_next_task() on something other than idle has numerous
subtle side effects that are giving me grief (seem to be hitting a
case where right as rq->curr is dequeued and instead set as
sched_delayed, we wake the donor from another cpu, tripping warnings
trying to set rq->curr as next when its sched_delayed).

And I definitely like the reworked proxy_force_return() suggestion,
which is much closer to the wake_up_process() approach I used back in
v22:
https://lore.kernel.org/lkml/20250926032931.27663-5-jstultz@xxxxxxxxxx/
But unfortunately my attempts to use a similar method ran into lots of
trouble w/ I was reworking the logic for v23, which resulted in the
messier logic currently in that function.

So I've had difficulty breaking up and isolating the minimal changeset
to support these simplifications, as they seem to also depend on the
semantic rework around task state and blocked_on that you've made.

Finally, I appreciate your thorough comments, though I'm wary of
adding more spots where we're accessing values without locks. I know
these tricks can be critical for performance, but the scheduler code
is already so terribly subtle with its many exceptions to locking
rules that I fret this makes it even harder to think about (though I'm
maybe overly sensitive here as I've felt particularly dim in my
efforts to rework your patch :).

So I'm a little stuck as I really want to get the next iteration (v25)
of the patch series out to the list, but I don't want to be ignoring
your suggestions here. Since the v25 series has added a number of
earlier fixups, the donor-migration chunk is getting quite long (16
patches currently), so I may just send the early portions (up to
"Handle blocked-waiter migration (and return migration)" - before we
get to this try_to_wake_up() change), and maybe I can hopefully sort
out how to better integrate this feedback for v26 (or whenever the
first chunk is queued).

That sound ok?

Again, very much appreciate your great insights and feedback here!
-john