[RFC PATCH 00/11] Reviving the Proxy Execution Series

From: Connor O'Brien
Date: Mon Oct 03 2022 - 17:45:12 EST

Proxy execution is an approach to implementing priority inheritance
based on distinguishing between a task's scheduler context (information
required in order to make scheduling decisions about when the task gets
to run, such as its scheduler class and priority) and its execution
context (information required to actually run the task, such as CPU
affinity). With proxy execution enabled, a task p1 that blocks on a
mutex remains on the runqueue, but its "blocked" status and the mutex on
which it blocks are recorded. If p1 is selected to run while still
blocked, the lock owner p2 can run "on its behalf", inheriting p1's
scheduler context. Execution context is not inherited, meaning that
e.g. the CPUs where p2 can run are still determined by its own affinity
and not p1's.

In practice a number of more complicated situations can arise: the mutex
owner might itself be blocked on another mutex, or it could be sleeping,
running on a different CPU, in the process of migrating between CPUs,
etc. Details on handling these various cases can be found in patch 7/11
("sched: Add proxy execution"), particularly in the implementation of
proxy() and accompanying comments.

Past discussions of proxy execution have often focused on the benefits
for deadline scheduling. Current interest for Android is based more on
desire for a broad solution to priority inversion on kernel mutexes,
including among CFS tasks. One notable scenario arises when cpu cgroups
are used to throttle less important background tasks. Priority inversion
can occur when an "important" unthrottled task blocks on a mutex held by
an "unimportant" task whose CPU time is constrained using cpu
shares. The result is higher worst case latencies for the unthrottled
task.[0] Testing by John Stultz with a simple reproducer [1] showed
promising results for this case, with proxy execution appearing to
eliminate the large latency spikes associated with priority

Proxy execution has been discussed over the past few years at several
conferences[3][4][5], but (as far as I'm aware) patches implementing the
concept have not been discussed on the list since Juri Lelli's RFC in
2018.[6] This series is an updated version of that patchset, seeking to
incorporate subsequent work by Juri[7], Valentin Schneider[8] and Peter
Zijlstra along with some new fixes.

Testing so far has focused on stability, mostly via mutex locktorture
with some tweaks to more quickly trigger proxy execution bugs. These
locktorture changes are included at the end of the series for
reference. The current series survives runs of >72 hours on QEMU without
crashes, deadlocks, etc. Testing on Pixel 6 with the android-mainline
kernel [9] yields similar results. In both cases, testing used >2 CPUs
and CONFIG_FAIR_GROUP_SCHED=y, a configuration Valentin Schneider
reported[10] showed stability problems with earlier versions of the

That said, these are definitely still a work in progress, with some
known remaining issues (e.g. warnings while booting on Pixel 6,
suspicious looking min/max vruntime numbers) and likely others I haven't
found yet. I've done my best to eliminate checks and code paths made
redundant by new fixes but some probably remain. There's no attempt yet
to handle core scheduling. Performance testing so far has been limited
to the aforementioned priority inversion reproducer. The hope in sharing
now is to revive the discussion on proxy execution and get some early
input for continuing to revise & refine the patches.

[0] https://raw.githubusercontent.com/johnstultz-work/priority-inversion-demo/main/results/charts/6.0-rc7-throttling-starvation.png
[1] https://github.com/johnstultz-work/priority-inversion-demo
[2] https://raw.githubusercontent.com/johnstultz-work/priority-inversion-demo/main/results/charts/6.0-rc7-vanilla-vs-proxy.png
[3] https://lpc.events/event/2/contributions/62/
[4] https://lwn.net/Articles/793502/
[5] https://lwn.net/Articles/820575/
[6] https://lore.kernel.org/lkml/20181009092434.26221-1-juri.lelli@xxxxxxxxxx/
[7] https://github.com/jlelli/linux/tree/experimental/deadline/proxy-rfc-v2
[8] https://gitlab.arm.com/linux-arm/linux-vs/-/tree/mainline/sched/proxy-rfc-v3/
[9] https://source.android.com/docs/core/architecture/kernel/android-common
[10] https://lpc.events/event/7/contributions/758/attachments/585/1036/lpc20-proxy.pdf#page=4

Connor O'Brien (2):
torture: support randomized shuffling for proxy exec testing
locktorture: support nested mutexes

Juri Lelli (3):
locking/mutex: make mutex::wait_lock irq safe
kernel/locking: Expose mutex_owner()
sched: Fixup task CPUs for potential proxies.

Peter Zijlstra (4):
locking/ww_mutex: Remove wakeups from under mutex::wait_lock
locking/mutex: Rework task_struct::blocked_on
sched: Split scheduler execution context
sched: Add proxy execution

Valentin Schneider (2):
kernel/locking: Add p->blocked_on wrapper
sched/rt: Fix proxy/current (push,pull)ability

include/linux/mutex.h | 2 +
include/linux/sched.h | 15 +-
include/linux/ww_mutex.h | 3 +
init/Kconfig | 7 +
init/init_task.c | 1 +
kernel/Kconfig.locks | 2 +-
kernel/fork.c | 6 +-
kernel/locking/locktorture.c | 20 +-
kernel/locking/mutex-debug.c | 9 +-
kernel/locking/mutex.c | 109 +++++-
kernel/locking/ww_mutex.h | 31 +-
kernel/sched/core.c | 679 +++++++++++++++++++++++++++++++++--
kernel/sched/deadline.c | 37 +-
kernel/sched/fair.c | 33 +-
kernel/sched/rt.c | 63 ++--
kernel/sched/sched.h | 42 ++-
kernel/torture.c | 10 +-
17 files changed, 955 insertions(+), 114 deletions(-)