Re: [RFC 0/8] CPU reclaiming for SCHED_DEADLINE

From: Juri Lelli
Date: Tue Jan 19 2016 - 05:10:46 EST

On 14/01/16 16:24, Luca Abeni wrote:
> Hi all,

Hi Luca,

thanks a lot for posting these patches, it is something that we need to
have at this point, IMHO.

I'll try to do some more testing hopefully soon, but, if you already
addressed Peter's comments and want to post a v2, please don't wait for
me. I'll try to test and review the next version. Let me see if I'm able
to setup that testing in the meantime :).


- Juri

> this patchset implements CPU reclaiming (using the GRUB algorithm[1])
> for SCHED_DEADLINE: basically, this feature allows SCHED_DEADLINE tasks
> to consume more than their reserved runtime, up to a maximum fraction
> of the CPU time (so that other tasks are left some spare CPU time to
> execute), if this does not break the guarantees of other SCHED_DEADLINE
> tasks.
> I send this RFC because I think the code still needs some work and/or
> cleanups (or maybe the patches should be splitted or merged in a different
> way), but I'd like to check if there is interest in merging this feature
> and if the current implementation strategy is reasonable.
> I added in cc the usual people interested in SCHED_DEADLINE patches; if
> you think that I should have added someone else, let me know (or please
> forward these patches to interested people).
> The implemented CPU reclaiming algorithm is based on tracking the
> utilization U_act of active tasks (first 5 patches), and modifying the
> runtime accounting rule (see patch 0006). The original GRUB algorithm is
> modified as described in [2] to support multiple CPUs (the original
> algorithm only considered one single CPU, this one tracks U_act per
> runqueue) and to leave an "unreclaimable" fraction of CPU time to non
> SCHED_DEADLINE tasks (the original algorithm can consume 100% of the CPU
> time, starving all the other tasks).
> I tried to split the patches so that the whole patchset can be better
> understood; if they should be organized in a different way, let me know.
> The first 5 patches (tracking of per-runqueue active utilization) can
> be useful for frequency scaling too (the tracked "active utilization"
> gives a clear hint about how much the core speed can be reduced without
> compromising the SCHED_DEADLINE guarantees):
> - patches 0001 and 0002 implement a simple tracking of the active
> utilization that is too optimistic from the theoretical point of
> view
> - patch 0003 is mainly useful for debugging this patchset and can
> be removed without problems
> - patch 0004 implements the "active utilization" tracking algorithm
> described in [1,2]. It uses a timer (named "inactive timer" here) to
> decrease U_act at the correct time (I called it the "0-lag time").
> I am working on an alternative implementation that does not use
> additional timers, but it is not ready yet; I'll post it when ready
> and tested
> - patch 0005 tracks the utilization of the tasks that can execute on
> each runqueue. It is a pessimistic approximation of U_act (so, if
> used instead of U_act it allows to reclaim less CPU time, but does
> not break SCHED_DEADLINE guarantees)
> - patches 0006-0008 implement the reclaiming algorithm.
> [1]
> [2]
> Juri Lelli (1):
> sched/deadline: add some tracepoints
> Luca Abeni (7):
> Track the active utilisation
> Correctly track the active utilisation for migrating tasks
> Improve the tracking of active utilisation
> Track the "total rq utilisation" too
> GRUB accounting
> Make GRUB a task's flag
> Do not reclaim the whole CPU bandwidth
> include/linux/sched.h | 1 +
> include/trace/events/sched.h | 69 ++++++++++++++
> include/uapi/linux/sched.h | 1 +
> kernel/sched/core.c | 3 +-
> kernel/sched/deadline.c | 214 +++++++++++++++++++++++++++++++++++++++++--
> kernel/sched/sched.h | 12 +++
> 6 files changed, 292 insertions(+), 8 deletions(-)
> --
> 1.9.1