Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution
From: K Prateek Nayak
Date: Fri Mar 27 2026 - 13:21:27 EST
Hello Peter,
On 3/27/2026 9:30 PM, Peter Zijlstra wrote:
> On Fri, Mar 27, 2026 at 07:03:19PM +0530, K Prateek Nayak wrote:
>> Hello Peter,
>>
>> On 3/27/2026 5:18 PM, Peter Zijlstra wrote:
>>> I tried to have a quick look, but I find it *very* hard to make sense of
>>> the differences.
>>
>> Couple of concerns I had with the current approach is:
>>
>> 1. Why can't we simply do block_task() + wake_up_process() for return
>> migration?
>
> So the way things are set up now, we have the blocked task 'on_rq', so
> ttwu() will take ttwu_runnable() path, and we wake the task on the
> 'wrong' CPU.
>
> At this point '->state == TASK_RUNNABLE' and schedule() will pick it and
> ... we hit '->blocked_on == PROXY_WAKING', which leads to
> proxy_force_return(), which does deactivate_task()+activate_task() as
> per a normal migration, and then all is well.
>
> Right?
>
> You're asking why proxy_force_return() doesn't use block_task()+ttwu()?
> That seems really wrong at that point -- after all: '->state ==
> TASK_RUNNABLE'.
>
> Or; are you asking why we don't block_task() at the point where we set
> '->blocked_on = PROXY_WAKING'? And then let ttwu() sort things out?
>
> I suspect the latter is really hard to do vs lock ordering, but I've not
> thought it through.
So taking a step back, this is what we have today (at least the
common scenario):
CPU0 (donor - A) CPU1 (owner - B)
================ ================
mutex_lock()
__set_current_state(TASK_INTERRUPTIBLE)
__set_task_blocked_on(M)
schedule()
/* Retained for proxy */
proxy_migrate_task()
==================================> /* Migrates to CPU1 */
...
send_sig(B)
signal_wake_up_state()
wake_up_state()
try_to_wake_up()
ttwu_runnable()
ttwu_do_wakeup() =============> /* A->__state = TASK_RUNNING */
/*
* After this point ttwu_state_match()
* will fail for A so a mutex_unlock()
* will have to go through __schedule()
* for return migration.
*/
__schedule()
find_proxy_task()
/* Scenario 1 - B sleeps */
__clear_task_blocked_on()
proxy_deactivate(A)
/* A->__state == TASK_RUNNING */
/* fallthrough */
/* Scenario 2 - return migration after unlock() */
__clear_task_blocked_on()
/*
* At this point proxy stops.
* Much later after signal.
*/
proxy_force_return()
schedule() <==================================
signal_pending_state()
clear_task_blocked_on()
__set_current_state(TASK_RUNNING)
... /* return with -EINR */
Basically, a blocked donor has to wait for a mutex_unlock() before it
can go process the signal and bail out on the mutex_lock_interruptible()
which seems counter productive - but it is still okay from correctness
perspective.
>
> One thing you *can* do it frob ttwu_runnable() to 'refuse' to wake the
> task, and then it goes into the normal path and will do the migration.
> I've done things like that before.
>
> Does that fix all the return-migration cases?
Yes it does! If we handle the return via ttwu_runnable(), which is what
proxy_needs_return() in the next chunk of changes aims to do and we can
build the invariant that TASK_RUNNING + task_is_blocked() is an illegal
state outside of __schedule() which works well with ttwu_state_match().
>
>> 2. Why does proxy_needs_return() (this comes later in John's tree but I
>> moved it up ahead) need the proxy_task_runnable_but_waking() override
>> of the ttwu_state_mach() machinery?
>> (https://github.com/johnstultz-work/linux-dev/commit/28ad4d3fa847b90713ca18a623d1ee7f73b648d9)
>
> Since it comes later, I've not seen it and not given it thought ;-)
>
> (I mean, I've probably seen it at some point, but being the gold-fish
> that I am, I have no recollection, so I might as well not have seen it).
>
> A brief look now makes me confused. The comment fails to describe how
> that situation could ever come to pass.
That is a signal delivery happening before unlock which will force
TASK_RUNNING but since we are waiting on an unlock, the wakeup from
unlock will see TASK_RUNNING + PROXY_WAKING.
We then later force it on the ttwu path to do return via
ttwu_runnable().
>
>> 3. How can proxy_deactivate() see a TASK_RUNNING for blocked donor?
>
> I was looking at that.. I'm not sure. I mean, having the clause doesn't
> hurt, but yeah, dunno.
Outlined in that flow above - Scenario 1.
>
>
>> Speaking of that commit, I would like you or Juri to confirm if it is
>> okay to set a throttled deadline task as rq->donor for a while until it
>> hits resched.
>
> I think that should be okay.
Good to know! Are you planning to push out the changes to queue? I can
send an RFC with the patches from my tree on top and we can perhaps
discuss it piecewise next week. Then we can decide if we want those
changes or not ;-)
--
Thanks and Regards,
Prateek