[PATCH v16 0/7] Single RunQueue Proxy Execution (v16)
From: John Stultz
Date: Sat Apr 12 2025 - 02:03:14 EST
Hey All,
After sending out v15, I unfortunately realized that in moving
some logic further back in the full patch series, I had
accidentally introduced some difficult to trigger bugs in the
subset of the series I was submitting. Sadly it took me a while
to figure out exactly which bits weren’t safe to migrate out,
but I’ve finally gotten it back into stable shape.
Many many thanks to Peter, Steven and Prateek for their helpful
feedback on the last revision. I have tried to integrate much
of the changes suggested, but I may have missed things in all
the great feedback, please let me know if you find anything.
Also, since v15, I presented at OSPM on the current status of
Proxy Execution, which you can watch here:
https://youtu.be/xcV1NtWENbs?feature=shared
So with that out of the way, here is v16 of the Proxy Execution
series, a generalized form of priority inheritance.
As I’m trying to submit this work in smallish digestible pieces,
in this series, I’m only submitting for review the logic that
allows us to do the proxying if the lock owner is on the same
runqueue as the blocked waiter. Introducing the
CONFIG_SCHED_PROXY_EXEC option and boot-argument, reworking the
task_struct::blocked_on pointer and wrapper functions, the
initial sketch of the find_proxy_task() logic, some fixes for
using split contexts, and finally same-runqueue proxying.
So with v16, I’ve obviously tried to stabilize the patch series
each step of the way, as well address the feedback provided.
Particularly complex has been the reworking of the
find_proxy_task() logic to use guard() to avoid some of the
uglier goto return logic.
I’ve also continued working on the rest of the series, which you
can find here:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v16-6.15-rc1/
https://github.com/johnstultz-work/linux-dev.git proxy-exec-v16-6.15-rc1
New changes in the full series include:
* Allow "sched_proxy_exec" without "=true" to enable
proxy-execution at boot time, in addition to the
"sched_proxy_exec=true" or "sched_proxy_exec=false" options as
suggested by Steven
* Drop the "default n" in Kconfig as suggested by Steven
* Add !SCHED_CLASS_EXT dependency until I can investigate if
sched_ext can understand split contexts, as suggested by Peter
* Undoing some changes I pushed out later in the series to be
earlier in order to avoid hitting bugs (mostly around
optimistic spinning/lock stealing, but also sched_balance
migrating blocked tasks).
* Renamed update_curr_se to update_se_times, as suggested by
Steven Rostedt.
* Move the enqueue_task_rt() changes to a more relevant patch,
as suggested by K Prateek Nayak
* Fixup whitespace error pointed out by K Prateek Nayak
* Use put_prev_set_next_task as suggested by K Prateek Nayak
* Try to rework find_proxy_task() locking to use guard and
proxy_deactivate_task() in the way Peter suggested.
* Simplified changes to enqueue_task_rt to match deadline's
logic, as pointed out by Peter
* Get rid of preserve_need_resched flag and rework per Peter's
suggestion
* Rework find_proxy_task() to use guard to cleanup the exit
gotos as Peter suggested.
* Properly understood the “lost-wakeup” issue I was tripping and
working around earlier, and reworked the forced
return-migration from find_proxy_task to use Peter’s
dequeue+wakeup approach, which helps resolve the cpuhotplug
issues I had also seen, caused by the manual return migration
sending tasks to offline cpus.
Issues still to address with the full series:
* Peter suggested an idea that instead of when tasks become
unblocked, using (blocked_on_state == BO_WAKING) to protect
against running proxy-migrated tasks on cpu’s they are not
affined to, we could dequeue tasks first and then wake them.
This does look to be cleaner in many ways, but the locking
rework is significant and I’ve not worked out all the kinks
with it yet. I am also a little worried that we may trip other
wakeup paths that might not do the dequeue first. However, I
have adopted this approach for the find_proxy_task() forced
return migration, and it’s working well.
* The new rework using guard() cleans up a lot of things, but
there are some edge cases where we change blocked_on locks, or
need to drop locks to do migration, so there still are some
odd goto exit cases needed to get out of the guard scope.
Ideas for further cleanups would be welcome here.
* Need to sort out what is needed for sched_ext to be ok with
proxy-execution enabled.
* K Prateek Nayak did some testing about a bit over a year ago
with an earlier version of the series and saw ~3-5%
regressions in some cases. I’m hoping to look into this soon
to see if we can reduce those further.
* The chain migration functionality needs further iterations and
better validation to ensure it truly maintains the RT/DL load
balancing invariants (despite this being broken in vanilla
upstream with RT_PUSH_IPI currently)
I’d really appreciate any feedback or review thoughts on this
series. I’m trying to keep the chunks small, reviewable and
iteratively testable, but if you have any suggestions on how to
improve the series, I’m all ears.
Credit/Disclaimer:
—--------------------
As mentioned previously, this Proxy Execution series has a long
history:
First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)
So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely
mine.
Thanks so much!
-john
[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf
Cc: Joel Fernandes <joelagnelf@xxxxxxxxxx>
Cc: Qais Yousef <qyousef@xxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Ben Segall <bsegall@xxxxxxxxxx>
Cc: Zimuzo Ezeozue <zezeozue@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
Cc: Metin Kaya <Metin.Kaya@xxxxxxx>
Cc: Xuewen Yan <xuewen.yan94@xxxxxxxxx>
Cc: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
Cc: Suleiman Souhlal <suleiman@xxxxxxxxxx>
Cc: kernel-team@xxxxxxxxxxx
John Stultz (3):
sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
sched: Fix runtime accounting w/ split exec & sched contexts
sched: Add an initial sketch of the find_proxy_task() function
Peter Zijlstra (2):
locking/mutex: Rework task_struct::blocked_on
sched: Start blocked_on chain processing in find_proxy_task()
Valentin Schneider (2):
locking/mutex: Add p->blocked_on wrappers for correctness checks
sched: Fix proxy/current (push,pull)ability
.../admin-guide/kernel-parameters.txt | 5 +
include/linux/sched.h | 68 ++++-
init/Kconfig | 12 +
kernel/fork.c | 3 +-
kernel/locking/mutex-debug.c | 9 +-
kernel/locking/mutex.c | 18 ++
kernel/locking/mutex.h | 3 +-
kernel/locking/ww_mutex.h | 16 +-
kernel/sched/core.c | 258 +++++++++++++++++-
kernel/sched/deadline.c | 3 +
kernel/sched/fair.c | 35 ++-
kernel/sched/rt.c | 5 +
kernel/sched/sched.h | 22 +-
13 files changed, 428 insertions(+), 29 deletions(-)
--
2.49.0.604.gff1f9ca942-goog