[tip:sched/urgent] sched: Fix non-zero idle loadavg

From: tip-bot for Vik Heyndrickx
Date: Thu Jan 21 2016 - 13:55:43 EST


Commit-ID: 1f9649ef6aa1bac53fb478d9e641b22d67f8423c
Gitweb: http://git.kernel.org/tip/1f9649ef6aa1bac53fb478d9e641b22d67f8423c
Author: Vik Heyndrickx <vik.heyndrickx@xxxxxxxxxxx>
AuthorDate: Thu, 21 Jan 2016 10:23:25 +0100
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Thu, 21 Jan 2016 18:55:23 +0100

sched: Fix non-zero idle loadavg

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.4, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with
no processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99,
0.95 (multiplied by number of cores).

By removing the single code line that performed a rounding on the
internally kept load value, effectively returning this function
calc_load to its state it had before, the visualization problem is
completely fixed.

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.4 and on centos 7.1 kernel 3.10.0-327. It was tested
on single, dual, and octal cores system. It was tested on virtual hosts
and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Signed-off-by: Vik Heyndrickx <vik.heyndrickx@xxxxxxxxxxx>
[ Changelog edits ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Doug Smythies <dsmythies@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/56A0A38D.4040900@xxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/sched/loadavg.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index ef71590..eb83b93 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -101,7 +101,6 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
load *= exp;
load += active * (FIXED_1 - exp);
- load += 1UL << (FSHIFT - 1);
return load >> FSHIFT;
}