Re: [RFC PATCH 0/4] sched: Improve cpu load accounting with nohz

From: Dietmar Eggemann
Date: Fri Jan 15 2016 - 11:56:47 EST


Hi Byungchul,

On 15/01/16 07:07, Byungchul Park wrote:
> On Thu, Jan 14, 2016 at 10:23:46PM +0000, Dietmar Eggemann wrote:
>> On 01/14/2016 09:27 PM, Peter Zijlstra wrote:
>>> On Thu, Jan 14, 2016 at 09:19:00PM +0000, Dietmar Eggemann wrote:
>>>> @@ -4346,7 +4346,10 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,

[...]

>
> I re-checked the equation I expanded and fortunately found it had no
> problem. I think there are several ways to do it correctly. That is,
>
> option 1. decay the absolute value with decay_load_missed() and adjust
> the sign.
>
> option 2. make decay_load_missed() can handle negative value.
>
> option 3. refer to the patch below. I think this option is the best.
>
> -----8<-----
>
> From ba3d3355fcce51c901376d268206f58a7d0e4214 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@xxxxxxx>
> Date: Fri, 15 Jan 2016 15:58:09 +0900
> Subject: [PATCH] sched/fair: prevent using decay_load_missed() with a negative
> value
>
> decay_load_missed() cannot handle nagative value. So we need to prevent
> using the function with a negative value.
>
> Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
> ---
> kernel/sched/fair.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8dde8b6..3f08d75 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4443,8 +4443,14 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>
> /* scale is effectively 1 << i now, and >> i divides by scale */
>
> - old_load = this_rq->cpu_load[i] - tickless_load;
> + old_load = this_rq->cpu_load[i];
> old_load = decay_load_missed(old_load, pending_updates - 1, i);
> + old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
> + /*
> + * old_load can never be a negative value because a decayed
> + * tickless_load cannot be greater than the original
> + * tickless_load.
> + */
> old_load += tickless_load;
> new_load = this_load;
> /*
>

So in this case you want to decay old_load and add (tickless_load -
decay(tickless_load)) on top. IMHO, this makes sense.


(w/ your patch and cpu w/ NO_HZ_FULL)

update_cpu_load_nohz: cpu=3 jiffies=4294935491
this_rq->last_load_update_tick=4294935489 pending_updates=2 active=1
load=0x114
__update_cpu_load: cpu=3 this_load=0x114 pending_updates=2
tickless_load=0xc2
__update_cpu_load: cpu=3 this_rq->cpu_load[0 .. 4] = [0xc2 0x62 0x32
0x1d 0x14]
__update_cpu_load: cpu=3 1. old_load=0x62
__update_cpu_load: cpu=3 2.1 old_load=0x31
__update_cpu_load: cpu=3 2.2 old_load=0xffffffffffffffd0 <-- after
decaying tickless_load
__update_cpu_load: cpu=3 3. old_load=0x92 <-- after
adding tickless_load
__update_cpu_load: cpu=3 1. new_load=0x92
__update_cpu_load: cpu=3 2. new_load=0x92
__update_cpu_load: cpu=3 this_rq->cpu_load[1]=0xd3
...
update_cpu_load_active: cpu=3 this_rq->last_load_update_tick=4294935491
pending_updates=1 active=1 load=0x13e
__update_cpu_load: cpu=3 this_load=0x13e pending_updates=1
tickless_load=0x114
__update_cpu_load: cpu=3 this_rq->cpu_load[0 .. 4] = [0x114 0xd3 0x86
0x4f 0x2f]
...

Another point ... 'active=1' (function header: @active: !0 for NOHZ_FULL
is a little bit misleading) is also true for when __update_cpu_load() is
called from update_cpu_load_active(). In this case tickless_load
wouldn't have to be set at all since pending_updates is 1,
decay_load_missed() can handle that by bailing in case missed_updates = 0.

Couldn't we set tickless_load only in case:

unsigned long tickless_load = (active && pending_updates > 1) ?
this_rq->cpu_load[0] : 0;

Even though update_cpu_load_nohz() can call with pending_updates=1 and
active=1 but then we don't have to decay.