[RFC v4 0/6] CPU reclaiming for SCHED_DEADLINE
From: Luca Abeni
Date: Fri Dec 30 2016 - 06:33:49 EST
From: Luca Abeni <luca.abeni@xxxxxxxx>
here is a new version of the patchset implementing CPU reclaiming
(using the GRUB algorithm) for SCHED_DEADLINE.
Basically, this feature allows SCHED_DEADLINE tasks to consume more
than their reserved runtime, up to a maximum fraction of the CPU time
(so that other tasks are left some spare CPU time to execute), if this
does not break the guarantees of other SCHED_DEADLINE tasks.
The patchset applies on top of tip/master.
The implemented CPU reclaiming algorithm is based on tracking the
utilization U_act of active tasks (first 2 patches), and modifying the
runtime accounting rule (see patch 0004). The original GRUB algorithm is
modified as described in  to support multiple CPUs (the original
algorithm only considered one single CPU, this one tracks U_act per
runqueue) and to leave an "unreclaimable" fraction of CPU time to non
SCHED_DEADLINE tasks (see patch 0005: the original algorithm can consume
100% of the CPU time, starving all the other tasks).
Patch 0003 uses the newly introduced "inactive timer" (introduced in
patch 0002) to fix dl_overflow() and __setparam_dl().
Patch 0006 allows to enable CPU reclaiming only on selected tasks.
Changes since v3:
the most important change is the introduction of a new "dl_non_contending"
flag in the "sched_dl_entity" structure, that allows to avoid a race
condition identified by Peter
(http://lkml.iu.edu/hypermail/linux/kernel/1604.0/02822.html) and Juri
For the moment, I added a new field (similar to the other "dl_*" flags)
to the deadline scheduling entity; if needed I can move all the dl_* flags
to a single field in a following patch.
Other than this, I tried to address all the comments I received, and to
add comments requested in the previous reviews.
In particular, the add_running_bw() and sub_running_bw() functions are now
marked as inline, and have been simplified as suggested by Daniel and
The overflow and underflow checks in these functions have been modified
as suggested by Peter; because of a limitation of SCHED_WARN_ON(), the
code in sub_running_bw() is slightly more complex. If SCHED_WARN_ON() is
improved (as suggested in a previous email of mine), I can simplify
sub_running_bw() in a following patch.
I also updated the patches to apply on top of tip/master.
Finally, I (hopefully) fixed an issue with my usage of get_task_struct() /
put_task_struct() in the previous patches: previously, I did
"get_task_struct(p)" before arming the "inactive task timer", and
"put_task_struct(p)" in the timer handler... But I forgot to call
"put_task_struct(p)" when successfully cancelling the timer; this should
be fixed in the new version of patch 0002.
 Lipari, G., & Baruah, S. (2000). Greedy reclamation of unused bandwidth in constant-bandwidth servers. In Real-Time Systems, 2000. Euromicro RTS 2000. 12th Euromicro Conference on (pp. 193-200). IEEE.
 Abeni, L., Lelli, J., Scordino, C., & Palopoli, L. (2014, October). Greedy CPU reclaiming for SCHED DEADLINE. In Proceedings of the Real-Time Linux Workshop (RTLWS), Dusseldorf, Germany.
Luca Abeni (6):
sched/deadline: track the active utilization
sched/deadline: improve the tracking of active utilization
sched/deadline: fix the update of the total -deadline utilization
sched/deadline: implement GRUB accounting
sched/deadline: do not reclaim the whole CPU bandwidth
sched/deadline: make GRUB a task's flag
include/linux/sched.h | 18 +++-
include/uapi/linux/sched.h | 1 +
kernel/sched/core.c | 45 ++++----
kernel/sched/deadline.c | 260 +++++++++++++++++++++++++++++++++++++++++----
kernel/sched/sched.h | 13 +++
5 files changed, 291 insertions(+), 46 deletions(-)