Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution

From: K Prateek Nayak

Date: Sat Mar 28 2026 - 00:53:29 EST

Hello John,

On 3/28/2026 12:40 AM, John Stultz wrote:
> On Wed, Mar 25, 2026 at 3:52 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>> On 3/25/2026 12:43 AM, John Stultz wrote:
>>> There’s also been some further improvements In the full Proxy
>>> Execution series:
>>> * Tweaks to proxy_needs_return() suggested by K Prateek
>>
>> To answer your question on v25, I finally seem to have
>> ttwu_state_match() happy with the pieces in:
>> https://github.com/kudureranganath/linux/commits/kudure/sched/proxy/ttwu_state_match/
>>
>> The base rationale is still the same from
>> https://lore.kernel.org/lkml/eccf9bb5-8455-48e5-aa35-4878c25f6822@xxxxxxx/
>
> So thank you so much for sharing this tree! It's definitely helpful
> and better shows how to split up the larger proposal you had.
>
> I've been occupied chasing the null __pick_eevdf() return issue (which
> I've now tripped without my proxy changes, so its an upstream thing
> but I'd still like to bisect it down),

Is the __pick_eevdf() returning NULL on tip:sched/core or is this on
mainline?

_Insert GTA San Andreas "Here we go again." meme here_

> along with other items, so I've
> not yet been able to fully ingest your changes. I did run some testing
> on them and didn't see any immediate issues (other then the null
> __pick_eevdf() issue, which limits the testing time to ~4 hours), and
> I even ran it along with the sleeping owner enqueuing change on top
> which had been giving me grief in earlier attempts to integrate these
> suggestions. So that's good!
>
> My initial/brief reactions looking through your the series:
>
> * sched/core: Clear "blocked_on" relation if schedule races with wakeup
>
> At first glance, this makes me feel nervous because clearing the
> blocked_on value has long been a source of bugs in the development of
> the proxy series, as the task might have been proxy-migrated to a cpu
> where it can't run. That's why my mental rules tend towards doing the
> clearing in a few places and setting PROXY_WAKING in most cases (so
> we're sure to evaluate the task before letting it run). My earlier
> logic of keeping blocked_on_state separate from blocked_on was trying
> to make these rules as obvious as possible, and since consolidating
> them I still get mentally muddied at times - ie, we probably don't
> need to be clearing blocked_on in the mutex lock paths anymore, but
> the symmetry is a little helpful to me.
>
> But the fact that you're clearing the state on prev here, and at that
> point prev is current saves it, since current can obviously run on
> this cpu. So probably just needs a comment to that effect.

Ack!

>
> * sched/core: Handle "blocked_on" clearing for wakeups in ttwu_runnable()
>
> Mostly looks sane to me (though I still have some heistancy to
> dropping the set_task_blocked_on_waking() bit)
>
> * sched/core: Remove "p->wake_cpu" constraint in proxy_needs_return()
>
> Yeah, that's a sound call, the shortcut isn't necessary and just adds
> complexity.
>
> * sched/core: Allow callers of try_to_block_task() to handle
> "blocked_on" relation
>
> Seems like it could be pulled up earlier in the series? (with your first change)
>
> * sched/core: Prepare proxy_deactivate() to comply with ttwu state machinery
>
> This one I've not totally gotten my head around, still. The
> "WRITE_ONCE(p->__state, TASK_RUNNING);" in find_proxy_task() feels
> wrong, as it looks like we're overriding what ttwu should be handling.

So the reason for that is, we can have:

CPU0 (owner - A) CPU1 (donor - B)
================ ================

mutex_unlock(M)
atomic_long_try_cmpxchg_release() /* B is just trying to acquire the mutex. */
... schedule() /* prev = B, next = B; B is blocked on A */
find_proxy_task()
...
owner = __mutex_owner(M);
if (!owner && task_current(rq, B))
__clear_task_blocked_on(p, NULL)
return B
__set_task_blocked_on_waking(B, M); ... /* B starts running without TASK_RUNNING. */
/* nop since !B->blocked_on */
/*
* Scenario 1 - B gets mutex and then sets
* TASK_RUNNING on its own.
*/
/* Scenario 2 */
wake_q_add(B)
wake_up_process()
ttwu_state_match() /* true */
B->__state = TASK_RUNNING;

So in either case, task will wake up and set TASK_RUNNING so we
can just do the pending bits of wakeup in __schedule(). I think
even without an explicit TASK_RUNNING it should be fine but I
need to jog my memory on why I added that (maybe for caution).

If the task fails to acquire mutex, it'll reset to blocked
state and go into schedule() and everything should just work
out fine.

> But again, this is only done on current, so it's probably ok.
> Similarly the clear_task_blocked_on() in proxy_deactivate() doesn't
> make it clear how we ensure we're not proxy-migrated,

So the rationale behind that was, we should *never* hit that
condition but if we are, perhaps we can simply do a move_queued_task()
back to "wake_cpu" to ensure correctness?

> and the
> clear_task_blocked_on() in __block_task() feels wrong to me, as I
> think we will need that for sleeping owner enqueuing.

Yes, for sleeping owner that is not the ideal place - I completely
agree with that. Let me go stare at that find a better place to
put it.

>
> But again, this didn't crash (at least right away), so it may just be
> I've not fit it into my mental model yet and I'll get it eventually.

Yeah, but then you lose the "blocked_on" chain when you deactivate
the donors only for it to be reconstructed back by running the task
for a little bit and re-establishing that relation so although
it might not have crashed (yet!), it is pretty inefficient.

I'll go stare more at that.

>
> * sched/core: Remove proxy_task_runnable_but_waking()
>
> Looks lovely, but obviously depends on the previous changes.
>
> * sched/core: Simplify proxy_force_return()
>
> Again, I really like how much that simplifies the logic! But I'm
> hesitant as my previous attempts to do similar didn't work, and it
> seems it depends on the ttwu state machinery change I've not fully
> understood.

Highly intertwined indeed! I'll try to add more comments and improve
the commit messages.

>
> * sched/core: Reset the donor to current task when donor is woken
>
> Looks nice! I fret there may be some subtlety I'm missing, but once I
> get some confidence in it, I'll be happy to have it.

Ack! I too will keep testing. Btw, do you have something that stresses
the deadline bits? I can't seem to reliably get something running with
lot of preemptions when holding mutexes.

>
> Anyway, apologies I've not had more time to spend on your feedback
> yet. I was hoping to start integrating and folding in your proposed
> changes for another revision (if you are ok with that - I can keep
> them separate as well, but it feels like more churn for reviewers),
> but with Peter sounding like he's in-progress on queueing the current
> set (with modifications), I want to wait to see if we should just work
> this out on top of what he has (which I'm fine with).

Ack! None of this is strictly necessary until we get to ttwu handling
the return migration so it should be okay. If you are occupied, I can
test and send these changes on top separately too to ease some load.

>
> As always, many many thanks for your time and feedback here! I really
> appreciate your contributions to this effort!

And thanks a ton for looking at the tree. Much appreciated _/\_

--
Thanks and Regards,
Prateek