[RFC PATCH v15 0/7] Single RunQueue Proxy Execution (v15)
From: John Stultz
Date: Wed Mar 12 2025 - 18:12:15 EST
Hey All,
After sending out the previous version of this series and
getting some great feedback from Peter, I was pulled into a few
other directions for a bit. But I’ve been able to get back to
the proxy work the last few weeks and wanted to send this
iteration out in preparation for discussions at OSPM next week.
So here is v15 of the Proxy Execution series, a generalized form
of priority inheritance.
As I’m trying to submit this work in smallish digestible pieces,
in this series, I’m only submitting for review the logic that
allows us to do the proxying if the lock owner is on the same
runqueue as the blocked waiter. Introducing the
CONFIG_SCHED_PROXY_EXEC option and boot-argument, reworking the
task_struct::blocked_on pointer and wrapper functions, the
initial sketch of the find_proxy_task() logic, some fixes for
using split contexts, and finally same-runqueue proxying.
With v15, I’ve tried to address some of Peter’s feedback,
splitting apart some patches so they are easier to review, and
breaking out some functionality that is not yet needed for
single-runqueue proxying, so that it can be introduced later,
closer to where it is necessary.
I’ve also continued working on the rest of the series, which
you can find here:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v15-6.14-rc6/
https://github.com/johnstultz-work/linux-dev.git proxy-exec-v15-6.14-rc6
New changes in the full series include:
* Having CONFIG_SCHED_PROXY_EXEC depend on EXPERT for now, as
its use has pretty narrow value until we get to multi-runqueue
proxying.
* Improved naming consistency and using the guard macro where
appropriate
* Moving the blocked_on_state logic to later in the series
* Improved comments
* Build fixes for !CONFIG_SMP
* Moving the zap_balance_callback() logic to later in the series
* Fixes for when sched_proxy_exec() is disabled
Issues still to address with the full series:
* Peter suggested an idea that instead of when tasks become
unblocked, using (blocked_on_state == BO_WAKING) as a guard
against running proxy-migrated tasks on cpu’s they are not
affined to, we could dequeue tasks first and then wake them.
This does look to be cleaner in many ways, but the locking
rework is significant and I’ve not worked out all the kinks
with it yet.
* In the full series with proxy migration (and again, for
clarity not with this same-rq proxying series I’m sending out
here), I still am using some workarounds to avoid hitting some
rare cases of what seem to be lost wakeups, where a task was
marked as BO_WAKING, and the mutex it is blocked on has no
owner, but the wakeup on the waiter never managed to
transition it to BO_RUNNABLE. The workarounds handle doing the
return migration from within find_proxy_task() but I still
feel that those fixups shouldn’t be necessary, so I suspect
the mutex unlock or ttwu logic has a race somewhere I’m
missing.
* One new issue I found with the workarounds I mentioned in the
previous bullet, is that they can cause warnings during
cpuhotplug if we try to do manual return-migration to
task->wake_cpu and that cpu is offline.
* K Prateek Nayak did some testing about a bit over a year ago
with an earlier version of the series and saw ~3-5% regressions
in some cases. I’m hoping to look into this soon to see if we
can reduce those further.
* The chain migration functionality needs further iterations and
better validation to ensure it truly maintains the RT/DL load
balancing invariants (despite this being broken in vanilla
upstream with RT_PUSH_IPI currently)
I’d really appreciate any feedback or review thoughts on this
series. I’m trying to keep the chunks small, reviewable and
iteratively testable, but if you have any suggestions on how to
improve the series, I’m all ears.
Credit/Disclaimer:
—--------------------
As mentioned previously, this Proxy Execution series has a long
history:
First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)
So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely mine.
Thanks so much!
-john
[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf
Cc: Joel Fernandes <joelagnelf@xxxxxxxxxx>
Cc: Qais Yousef <qyousef@xxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Ben Segall <bsegall@xxxxxxxxxx>
Cc: Zimuzo Ezeozue <zezeozue@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
Cc: Metin Kaya <Metin.Kaya@xxxxxxx>
Cc: Xuewen Yan <xuewen.yan94@xxxxxxxxx>
Cc: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
Cc: Suleiman Souhlal <suleiman@xxxxxxxxxx>
Cc: kernel-team@xxxxxxxxxxx
John Stultz (3):
sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
sched: Fix runtime accounting w/ split exec & sched contexts
sched: Add an initial sketch of the find_proxy_task() function
Peter Zijlstra (2):
locking/mutex: Rework task_struct::blocked_on
sched: Start blocked_on chain processing in find_proxy_task()
Valentin Schneider (2):
locking/mutex: Add p->blocked_on wrappers for correctness checks
sched: Fix proxy/current (push,pull)ability
.../admin-guide/kernel-parameters.txt | 5 +
include/linux/sched.h | 62 +++-
init/Kconfig | 10 +
kernel/fork.c | 3 +-
kernel/locking/mutex-debug.c | 9 +-
kernel/locking/mutex.c | 11 +
kernel/locking/mutex.h | 3 +-
kernel/locking/ww_mutex.h | 16 +-
kernel/sched/core.c | 266 +++++++++++++++++-
kernel/sched/fair.c | 31 +-
kernel/sched/rt.c | 15 +-
kernel/sched/sched.h | 22 +-
12 files changed, 423 insertions(+), 30 deletions(-)
--
2.49.0.rc0.332.g42c0ae87b1-goog