[PATCH v25 0/9] Simple Donor Migration for Proxy Execution
From: John Stultz
Date: Thu Mar 12 2026 - 22:31:43 EST
Hey All,
Yet another iteration on the next chunk of the Proxy Exec
series: Simple Donor Migration
This is just the next step for Proxy Execution, to allow us to
migrate blocked donors across runqueues to boost remote lock
owners.
As always, I’m trying to submit this larger work in smallish
digestible pieces, so in this portion of the series, I’m only
submitting for review and consideration some recent fixups, and
the logic that allows us to do donor(blocked waiter) migration,
which requires some additional changes to locking and extra
state tracking to ensure we don’t accidentally run a migrated
donor on a cpu it isn’t affined to, as well as some extra
handling to deal with balance callback state that needs to be
reset when we decide to pick a different task after doing donor
migration.
Much of the new logic in this version is thanks to K Prateek,
who provided a lot of insightful suggestions to the v24 series!
New in this iteration:
* With additional changes, the previous full Donor Migration
series had gotten pretty long, so to go easy on reviewers I’ve
dropped the later Donor Migration patches I had in v24, which
basically provided optimizations so try_to_wake_up() would do
return-migration, smarter mutex handoffs, and proxy migrating
the entire chain in one pass. K Prateek also had some
suggestions for further improvements in these later patches
that I have not yet addressed, so for now I’m going to table
them and will revisit once progress is made with this set.
* Fix for proxy_tag_curr() erroneously leaving tasks off of the
pushable list, reported by K Prateek and suggested by Peter,
allowing us to drop the proxy_tag_curr() logic completely.
* Peter noted compilers don’t always optimize as we would like,
and suggested reworked logic to reduce repetitive
sched_proxy_exec() branches.
* Rework of proxy_force_return() suggested by K Prateek to use
WF_TTWU flags, and to use attach_one_task() helper to simplify
code.
* Other small cleanups through the series suggested by
K Prateek.
I’d love to get further feedback on any place where these
patches are confusing, or could use additional clarifications.
There’s also been some further improvements In the full Proxy
Execution series:
* David Stevens reported and diagnosed an issue with loadavg
being incorrect due to incorrect nr_uninterruptible accounting
in the sleeping-owner handling.
* An issue with rwsem support was found and fixed, along with
other simplifications to the changes.
* Fix suggested by Peter for an edge case with DL adding tasks
twice to the pushable list when Proxy Exec pushes the donor
task.
* K Prateek had further suggestions to improve the optimized
donor migration changes, dropping the unnecessary
migration_node addition to the task_struct, and using
atttach_tasks to simplify the full chain migration.
* Tiffany Yang pointed out some unnecessary CONFIG_SMP bits
were still lingering and could be cleaned up.
* An initial draft at Documentation update to describe Proxy
Execution.
I’d appreciate any testing or comments that folks have with
the full set!
You can find the full Proxy Exec series here:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v25-7.0-rc3/
https://github.com/johnstultz-work/linux-dev.git proxy-exec-v25-7.0-rc3
Issues still to address with the full series:
* Resolve a regression in the later optimized donor-migration
changes combined with “Fix 'stuck' dl_server” change in 6.19
* With the full series against 7.0-rc3, when doing heavy stress
testing, I’m occasionally hitting crashes due to null return
from __pick_eevdf(). Need to dig on this and find why it
doesn’t happen against 6.18
* Try to integrate and rework K Prateek’s suggestions for the
later optimized donor-migration changes.
* Continue working to get sched_ext to be ok with Proxy
Execution enabled.
* Reevaluate performance regression K Prateek Nayak found with
the full series.
* The chain migration functionality needs further iterations and
better validation to ensure it truly maintains the RT/DL load
balancing invariants (despite this being broken in vanilla
upstream with RT_PUSH_IPI currently)
Future work:
* Expand to more locking primitives: Figuring out pi-futexes
would be good, using proxy for Binder PI is something else
we’re exploring.
* Eventually: Work to replace rt_mutexes and get things happy
with PREEMPT_RT
I’d really appreciate any feedback or review thoughts on the
full series as well. I’m trying to keep the chunks small,
reviewable and iteratively testable, but if you have any
suggestions on how to improve the larger series, I’m all ears.
Credit/Disclaimer:
—--------------------
As always, this Proxy Execution series has a long history with
lots of developers that deserve credit:
First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!).
Thanks also to Joel Fernandes, Dietmar Eggemann, Metin Kaya,
K Prateek Nayak and Suleiman Souhlal for their substantial
review, suggestion, and patch contributions.
So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are surely mine.
Thanks so much!
-john
[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf
Cc: Joel Fernandes <joelagnelf@xxxxxxxxxx>
Cc: Qais Yousef <qyousef@xxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Ben Segall <bsegall@xxxxxxxxxx>
Cc: Zimuzo Ezeozue <zezeozue@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
Cc: Metin Kaya <Metin.Kaya@xxxxxxx>
Cc: Xuewen Yan <xuewen.yan94@xxxxxxxxx>
Cc: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
Cc: Suleiman Souhlal <suleiman@xxxxxxxxxx>
Cc: kuyo chang <kuyo.chang@xxxxxxxxxxxx>
Cc: hupu <hupu.gm@xxxxxxxxx>
Cc: kernel-team@xxxxxxxxxxx
John Stultz (9):
sched: Make class_schedulers avoid pushing current, and get rid of
proxy_tag_curr()
sched: Minimise repeated sched_proxy_exec() checking
locking: Add task::blocked_lock to serialize blocked_on state
sched: Fix modifying donor->blocked on without proper locking
sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy
return-migration
sched: Add assert_balance_callbacks_empty helper
sched: Add logic to zap balance callbacks if we pick again
sched: Move attach_one_task and attach_task helpers to sched.h
sched: Handle blocked-waiter migration (and return migration)
include/linux/sched.h | 91 +++++++----
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/locking/mutex-debug.c | 4 +-
kernel/locking/mutex.c | 40 +++--
kernel/locking/mutex.h | 6 +
kernel/locking/ww_mutex.h | 16 +-
kernel/sched/core.c | 300 +++++++++++++++++++++++++++++------
kernel/sched/deadline.c | 16 +-
kernel/sched/fair.c | 26 ---
kernel/sched/rt.c | 15 +-
kernel/sched/sched.h | 35 +++-
12 files changed, 414 insertions(+), 137 deletions(-)
--
2.53.0.880.g73c4285caa-goog