[tip:sched/core] sched/rt: Kick RT bandwidth timer immediately on start up

From: tip-bot for Steven Rostedt
Date: Mon Feb 29 2016 - 06:18:07 EST


Commit-ID: c3a990dc9fab590fb88608410f1cc2bc866bdf30
Gitweb: http://git.kernel.org/tip/c3a990dc9fab590fb88608410f1cc2bc866bdf30
Author: Steven Rostedt <rostedt@xxxxxxxxxxx>
AuthorDate: Tue, 16 Feb 2016 18:37:46 -0500
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Mon, 29 Feb 2016 09:53:07 +0100

sched/rt: Kick RT bandwidth timer immediately on start up

I've been debugging why deadline tasks can cause the RT scheduler to
throttle, even when the deadline tasks are only taking up 50% of the
CPU and RT tasks are not even using 1% of the CPU. Here's what I found.

In order to keep a CPU from being hogged by RT tasks, the deadline
scheduler adds its run time (delta_exec) to the rt_time of the RT
bandwidth. That way, if the two use more than 95% of the CPU within one
second (default settings), the RT tasks are throttled to allow non RT
tasks to run.

Although the deadline tasks add their run time to the RT bandwidth, it
lets the RT tasks do the accounting. This is where the problem lies. If
a deadline task runs for a bit, and no RT tasks are running, then it
will continually add to the RT rt_time that is used to calculate how
much CPU the RT tasks use. But no RT period is in play, and this
accumulation of the runtime never gets reset.

When an RT task finally gets to run, and the watchdog goes off, it can
see that the RT task has used more than it should of, because the
deadline task added all this runtime to its rt_time. Then the RT task
that just woke up gets throttled for no good reason.

I also noticed that when an RT task is queued, it starts the timer to
account for overload and such. But that timer goes off one period
later, which may be too late and the extra rt_time will trigger a
throttle.

This is a quick work around to the problem. When a new RT task is
queued, the bandwidth timer is set to go off immediately. Then the
timer can clear out the extra time added to the rt_time while there was
no RT task running. This stops my tests from triggering the throttle,
and it will still throttle if an RT task runs too much, even while a
deadline task is running.

A better solution may be to subtract the bandwidth that the deadline
task uses from the rt_runtime, and add it back when its finished. Then
there wont be a need for runtime tracking of the time used by deadline
tasks.

I may play with that solution tomorrow.

Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: <juri.lelli@xxxxxxxxx>
Cc: <williams@xxxxxxxxxx>
Cc: Clark Williams
Cc: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
Cc: John Kacur <jkacur@xxxxxxxxxx>
Cc: Juri Lelli
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/20160216183746.349ec98b@xxxxxxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/sched/rt.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 406a9c2..a774b4d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -58,7 +58,15 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
raw_spin_lock(&rt_b->rt_runtime_lock);
if (!rt_b->rt_period_active) {
rt_b->rt_period_active = 1;
- hrtimer_forward_now(&rt_b->rt_period_timer, rt_b->rt_period);
+ /*
+ * SCHED_DEADLINE updates the bandwidth, as a run away
+ * RT task with a DL task could hog a CPU. But DL does
+ * not reset the period. If a deadline task was running
+ * without an RT task running, it can cause RT tasks to
+ * throttle when they start up. Kick the timer right away
+ * to update the period.
+ */
+ hrtimer_forward_now(&rt_b->rt_period_timer, ns_to_ktime(0));
hrtimer_start_expires(&rt_b->rt_period_timer, HRTIMER_MODE_ABS_PINNED);
}
raw_spin_unlock(&rt_b->rt_runtime_lock);