[RFD/RFC PATCH 0/8] Towards implementing proxy execution

From: Juri Lelli
Date: Tue Oct 09 2018 - 05:25:07 EST


Hi all,

Proxy Execution (also goes under several other names) isn't a new
concept, it has been mentioned already in the past to this community
(both in email discussions and at conferences [1, 2]), but no actual
implementation that applies to a fairly recent kernel exists as of today
(of which I'm aware of at least - happy to be proven wrong).

Very broadly speaking, more info below, proxy execution enables a task
to run using the context of some other task that is "willing" to
participate in the mechanism, as this helps both tasks to improve
performance (w.r.t. the latter task not participating to proxy
execution).

This RFD/proof of concept aims at starting a discussion about how we can
get proxy execution in mainline. But, first things first, why do we even
care about it?

I'm pretty confident with saying that the line of development that is
mainly interested in this at the moment is the one that might benefit
in allowing non privileged processes to use deadline scheduling [3].
The main missing bit before we can safely relax the root privileges
constraint is a proper priority inheritance mechanism, which translates
to bandwidth inheritance [4, 5] for deadline scheduling, or to some sort
of interpretation of the concept of running a task holding a (rt_)mutex
within the bandwidth allotment of some other task that is blocked on the
same (rt_)mutex.

The concept itself is pretty general however, and it is not hard to
foresee possible applications in other scenarios (say for example nice
values/shares across co-operating CFS tasks or clamping values [6]).
But I'm already digressing, so let's get back to the code that comes
with this cover letter.

One can define the scheduling context of a task as all the information
in task_struct that the scheduler needs to implement a policy and the
execution contex as all the state required to actually "run" the task.
An example of scheduling context might be the information contained in
task_struct se, rt and dl fields; affinity pertains instead to execution
context (and I guess decideing what pertains to what is actually up for
discussion as well ;-). Patch 04/08 implements such distinction.

As implemented in this set, a link between scheduling contexts of
different tasks might be established when a task blocks on a mutex held
by some other task (blocked_on relation). In this case the former task
starts to be considered a potential proxy for the latter (mutex owner).
One key change in how mutexes work made in here is that waiters don't
really sleep: they are not dequeued, so they can be picked up by the
scheduler when it runs. If a waiter (potential proxy) task is selected
by the scheduler, the blocked_on relation is used to find the mutex
owner and put that to run on the CPU, using the proxy task scheduling
context.

Follow the blocked-on relation:

,-> task <- proxy, picked by scheduler
| | blocked-on
| v
blocked-task | mutex
| | owner
| v
`-- task <- gets to run using proxy info

Now, the situation is (of course) more tricky than depicted so far
because we have to deal with all sort of possible states the mutex
owner might be in while a potential proxy is selected by the scheduler,
e.g. owner might be sleeping, running on a different CPU, blocked on
another mutex itself... so, I'd kindly refer people to have a look at
05/08 proxy() implementation and comments.

Peter kindly shared his WIP patches with us (me, Luca, Tommaso, Claudio,
Daniel, the Pisa gang) a while ago, but I could seriously have a decent
look at them only recently (thanks a lot to the other guys for giving a
first look at this way before me!). This set is thus composed of Peter's
original patches (which I rebased on tip/sched/core as of today,
commented and hopefully duly reported in changelogs what have I possibly
broke) plus a bunch of additional changes that seemed required to make
all this boot "successfully" on a virtual machine. So be advised! This
is good only for fun ATM (I actually really hope this is good enough for
discussion), pretty far from production I'm afraid. Share early, share
often, right? :-)

The main concerns I have with the current approach is that, being based
on mutex.c, it's both

- not linked with futexes
- not involving "legacy" priority inheritance (rt_mutex.c)

I believe one of the main reasons Peter started this on mutexes is to
have better coverage of potential problems (which I can assure everybody
it had). I'm not yet sure what should we do moving forward, and this is
exactly what I'd be pleased to hear your opinions on.

https://github.com/jlelli/linux.git experimental/deadline/proxy-rfc-v1

Thanks a lot in advance!

- Juri

1 - https://wiki.linuxfoundation.org/_media/realtime/events/rt-summit2017/proxy-execution_peter-zijlstra.pdf
2 - https://lwn.net/Articles/397422/ which "points" to https://goo.gl/3VrLza
3 - https://marc.info/?l=linux-rt-users&m=153450086400459&w=2
4 - https://ieeexplore.ieee.org/document/5562902
5 - http://retis.sssup.it/~lipari/papers/rtlws2013.pdf
6 - https://lore.kernel.org/lkml/20180828135324.21976-1-patrick.bellasi@xxxxxxx/

Juri Lelli (3):
locking/mutex: make mutex::wait_lock irq safe
sched: Ensure blocked_on is always guarded by blocked_lock
sched: Fixup task CPUs for potential proxies.

Peter Zijlstra (5):
locking/mutex: Convert mutex::wait_lock to raw_spinlock_t
locking/mutex: Removes wakeups from under mutex::wait_lock
locking/mutex: Rework task_struct::blocked_on
sched: Split scheduler execution context
sched: Add proxy execution

include/linux/mutex.h | 4 +-
include/linux/sched.h | 8 +-
init/Kconfig | 4 +
init/init_task.c | 1 +
kernel/Kconfig.locks | 2 +-
kernel/fork.c | 8 +-
kernel/locking/mutex-debug.c | 12 +-
kernel/locking/mutex.c | 127 +++++++--
kernel/sched/core.c | 510 +++++++++++++++++++++++++++++++++--
kernel/sched/deadline.c | 2 +-
kernel/sched/fair.c | 7 +
kernel/sched/rt.c | 2 +-
kernel/sched/sched.h | 30 ++-
13 files changed, 642 insertions(+), 75 deletions(-)

--
2.17.1