Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution

From: Peter Zijlstra

Date: Fri Apr 03 2026 - 05:22:30 EST


On Thu, Apr 02, 2026 at 11:31:56AM -0700, John Stultz wrote:

> So I like getting rid of proxy_force_return(), but its not clear to me
> that proxy_deactivate() is what we want to do in these
> find_proxy_task() edge cases.
>
> It feels like if we are already racing with ttwu, deactivating the
> task seems like it might open more windows where we might lose the
> wakeup.
>
> In fact, the whole reason we have proxy_force_return() is that earlier
> in the proxy-exec development, when we hit those edge cases we usually
> would return proxy_reschedule_idle() just to drop the rq lock and let
> ttwu do its thing, but there kept on being cases where we would end up
> with lost wakeups.
>
> But I'll give this a shot (and will integrate your ttwu_runnable
> cleanups regardless) and see how it does.

So the main idea is that ttwu() will be in charge of migrating back, as
one an only means of doing so.

This includes signals and unlock and everything.

This means that there are two main cases:

- ttwu() happens first and finds the task on_rq; we hit
ttwu_runnable().

- schedule() happens first and hits this task without means of going
forward.

Lets do the second first; this is handled by doing dequeue. It must take
the task off the runqueue, so it can select another task and make
progress. But this had me hit those proxy_deactivate() failure cases,
those must not exist.

The first is that deactivate can encounter TASK_RUNNING, this must not
be, because TASK_RUNNING would mean ttwu() has happened and that would
then have sorted everything out.

The second is that signal case, which again should not happen, because
the signal ttwu() should sort it all out. We just want to take the task
off the runqueue here.


Now the ttwu() case. So if it is first it will hit ttwu_runnable(), but
we don't want this case. So instead we dequeue the task and say: 'nope,
wasn't on_rq', which proceeds into the 'normal' wakeup path which does a
migration.

And note, that if proxy_deactivate() happened first, we simply skip that
first step and directly go into the normal wakeup path.

There is no worry about ttwu() going missing, ttwu() is changed to make
sure any ->TASK_RUNNING transition ensures ->blocked_on gets cleared and
the task ends up on a suitable CPU.

Anyway, that is the high level idea, like said I didn't get around to
doing all the details (and I clearly missed a few :-).