Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution

From: John Stultz

Date: Fri Mar 27 2026 - 15:13:26 EST


On Wed, Mar 25, 2026 at 3:52 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
> On 3/25/2026 12:43 AM, John Stultz wrote:
> > There’s also been some further improvements In the full Proxy
> > Execution series:
> > * Tweaks to proxy_needs_return() suggested by K Prateek
>
> To answer your question on v25, I finally seem to have
> ttwu_state_match() happy with the pieces in:
> https://github.com/kudureranganath/linux/commits/kudure/sched/proxy/ttwu_state_match/
>
> The base rationale is still the same from
> https://lore.kernel.org/lkml/eccf9bb5-8455-48e5-aa35-4878c25f6822@xxxxxxx/

So thank you so much for sharing this tree! It's definitely helpful
and better shows how to split up the larger proposal you had.

I've been occupied chasing the null __pick_eevdf() return issue (which
I've now tripped without my proxy changes, so its an upstream thing
but I'd still like to bisect it down), along with other items, so I've
not yet been able to fully ingest your changes. I did run some testing
on them and didn't see any immediate issues (other then the null
__pick_eevdf() issue, which limits the testing time to ~4 hours), and
I even ran it along with the sleeping owner enqueuing change on top
which had been giving me grief in earlier attempts to integrate these
suggestions. So that's good!

My initial/brief reactions looking through your the series:

* sched/core: Clear "blocked_on" relation if schedule races with wakeup

At first glance, this makes me feel nervous because clearing the
blocked_on value has long been a source of bugs in the development of
the proxy series, as the task might have been proxy-migrated to a cpu
where it can't run. That's why my mental rules tend towards doing the
clearing in a few places and setting PROXY_WAKING in most cases (so
we're sure to evaluate the task before letting it run). My earlier
logic of keeping blocked_on_state separate from blocked_on was trying
to make these rules as obvious as possible, and since consolidating
them I still get mentally muddied at times - ie, we probably don't
need to be clearing blocked_on in the mutex lock paths anymore, but
the symmetry is a little helpful to me.

But the fact that you're clearing the state on prev here, and at that
point prev is current saves it, since current can obviously run on
this cpu. So probably just needs a comment to that effect.

* sched/core: Handle "blocked_on" clearing for wakeups in ttwu_runnable()

Mostly looks sane to me (though I still have some heistancy to
dropping the set_task_blocked_on_waking() bit)

* sched/core: Remove "p->wake_cpu" constraint in proxy_needs_return()

Yeah, that's a sound call, the shortcut isn't necessary and just adds
complexity.

* sched/core: Allow callers of try_to_block_task() to handle
"blocked_on" relation

Seems like it could be pulled up earlier in the series? (with your first change)

* sched/core: Prepare proxy_deactivate() to comply with ttwu state machinery

This one I've not totally gotten my head around, still. The
"WRITE_ONCE(p->__state, TASK_RUNNING);" in find_proxy_task() feels
wrong, as it looks like we're overriding what ttwu should be handling.
But again, this is only done on current, so it's probably ok.
Similarly the clear_task_blocked_on() in proxy_deactivate() doesn't
make it clear how we ensure we're not proxy-migrated, and the
clear_task_blocked_on() in __block_task() feels wrong to me, as I
think we will need that for sleeping owner enqueuing.

But again, this didn't crash (at least right away), so it may just be
I've not fit it into my mental model yet and I'll get it eventually.

* sched/core: Remove proxy_task_runnable_but_waking()

Looks lovely, but obviously depends on the previous changes.

* sched/core: Simplify proxy_force_return()

Again, I really like how much that simplifies the logic! But I'm
hesitant as my previous attempts to do similar didn't work, and it
seems it depends on the ttwu state machinery change I've not fully
understood.

* sched/core: Reset the donor to current task when donor is woken

Looks nice! I fret there may be some subtlety I'm missing, but once I
get some confidence in it, I'll be happy to have it.

Anyway, apologies I've not had more time to spend on your feedback
yet. I was hoping to start integrating and folding in your proposed
changes for another revision (if you are ok with that - I can keep
them separate as well, but it feels like more churn for reviewers),
but with Peter sounding like he's in-progress on queueing the current
set (with modifications), I want to wait to see if we should just work
this out on top of what he has (which I'm fine with).

As always, many many thanks for your time and feedback here! I really
appreciate your contributions to this effort!
-john