Re: [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle

From: Peter Zijlstra
Date: Thu Jan 21 2016 - 10:29:11 EST


On Thu, Jan 21, 2016 at 10:23:25AM +0100, Vik Heyndrickx wrote:
> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they have
> no load at all.

Thanks, I've edited the patch Changelog to include a few extra details
you mentioned in our preview correspondence.

See below. Please let me know if you're OK with this.

---
Subject: sched: Fix non-zero idle loadavg
From: Vik Heyndrickx <vik.heyndrickx@xxxxxxxxxxx>
Date: Thu, 21 Jan 2016 10:23:25 +0100

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.4, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with
no processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99,
0.95 (multiplied by number of cores).

By removing the single code line that performed a rounding on the
internally kept load value, effectively returning this function
calc_load to its state it had before, the visualization problem is
completely fixed.

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.4 and on centos 7.1 kernel 3.10.0-327. It was tested
on single, dual, and octal cores system. It was tested on virtual hosts
and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes")
Cc: Doug Smythies <dsmythies@xxxxxxxxx>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@xxxxxxxxxxx>
[Changelog edits]
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/56A0A38D.4040900@xxxxxxxxxxx
---
kernel/sched/loadavg.c | 1 -
1 file changed, 1 deletion(-)

--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -101,7 +101,6 @@ calc_load(unsigned long load, unsigned l
{
load *= exp;
load += active * (FIXED_1 - exp);
- load += 1UL << (FSHIFT - 1);
return load >> FSHIFT;
}