Re: [RFC 0/8] CPU reclaiming for SCHED_DEADLINE

From: Luca Abeni
Date: Tue Jan 19 2016 - 06:50:57 EST

On 01/19/2016 11:11 AM, Juri Lelli wrote:
On 14/01/16 16:24, Luca Abeni wrote:
Hi all,

Hi Luca,

thanks a lot for posting these patches, it is something that we need to
have at this point, IMHO.

I'll try to do some more testing hopefully soon, but, if you already
addressed Peter's comments and want to post a v2, please don't wait for
me. I'll try to test and review the next version. Let me see if I'm able
to setup that testing in the meantime :).
Thanks Juri; I'll work on Peter's comments in the next days, and I'll
post a v2 of the RFC, probably in the first days of February.



- Juri

this patchset implements CPU reclaiming (using the GRUB algorithm[1])
for SCHED_DEADLINE: basically, this feature allows SCHED_DEADLINE tasks
to consume more than their reserved runtime, up to a maximum fraction
of the CPU time (so that other tasks are left some spare CPU time to
execute), if this does not break the guarantees of other SCHED_DEADLINE

I send this RFC because I think the code still needs some work and/or
cleanups (or maybe the patches should be splitted or merged in a different
way), but I'd like to check if there is interest in merging this feature
and if the current implementation strategy is reasonable.

I added in cc the usual people interested in SCHED_DEADLINE patches; if
you think that I should have added someone else, let me know (or please
forward these patches to interested people).

The implemented CPU reclaiming algorithm is based on tracking the
utilization U_act of active tasks (first 5 patches), and modifying the
runtime accounting rule (see patch 0006). The original GRUB algorithm is
modified as described in [2] to support multiple CPUs (the original
algorithm only considered one single CPU, this one tracks U_act per
runqueue) and to leave an "unreclaimable" fraction of CPU time to non
SCHED_DEADLINE tasks (the original algorithm can consume 100% of the CPU
time, starving all the other tasks).

I tried to split the patches so that the whole patchset can be better
understood; if they should be organized in a different way, let me know.
The first 5 patches (tracking of per-runqueue active utilization) can
be useful for frequency scaling too (the tracked "active utilization"
gives a clear hint about how much the core speed can be reduced without
compromising the SCHED_DEADLINE guarantees):
- patches 0001 and 0002 implement a simple tracking of the active
utilization that is too optimistic from the theoretical point of
- patch 0003 is mainly useful for debugging this patchset and can
be removed without problems
- patch 0004 implements the "active utilization" tracking algorithm
described in [1,2]. It uses a timer (named "inactive timer" here) to
decrease U_act at the correct time (I called it the "0-lag time").
I am working on an alternative implementation that does not use
additional timers, but it is not ready yet; I'll post it when ready
and tested
- patch 0005 tracks the utilization of the tasks that can execute on
each runqueue. It is a pessimistic approximation of U_act (so, if
used instead of U_act it allows to reclaim less CPU time, but does
not break SCHED_DEADLINE guarantees)
- patches 0006-0008 implement the reclaiming algorithm.


Juri Lelli (1):
sched/deadline: add some tracepoints

Luca Abeni (7):
Track the active utilisation
Correctly track the active utilisation for migrating tasks
Improve the tracking of active utilisation
Track the "total rq utilisation" too
GRUB accounting
Make GRUB a task's flag
Do not reclaim the whole CPU bandwidth

include/linux/sched.h | 1 +
include/trace/events/sched.h | 69 ++++++++++++++
include/uapi/linux/sched.h | 1 +
kernel/sched/core.c | 3 +-
kernel/sched/deadline.c | 214 +++++++++++++++++++++++++++++++++++++++++--
kernel/sched/sched.h | 12 +++
6 files changed, 292 insertions(+), 8 deletions(-)