Re: [RFC PATCH 0/4] sched: Improve cpu load accounting with nohz

From: Dietmar Eggemann
Date: Wed Jan 20 2016 - 08:05:05 EST


On 20/01/16 00:48, Byungchul Park wrote:
> On Tue, Jan 19, 2016 at 02:04:57PM +0100, Peter Zijlstra wrote:
>> On Fri, Jan 15, 2016 at 04:56:36PM +0000, Dietmar Eggemann wrote:
>>> Couldn't we set tickless_load only in case:
>>>
>>> unsigned long tickless_load = (active && pending_updates > 1) ?
>>> this_rq->cpu_load[0] : 0;
>>>
>>> Even though update_cpu_load_nohz() can call with pending_updates=1 and
>>> active=1 but then we don't have to decay.
>>
>> decay_load_missed() has an early bail for !missed, which will be tickled
>> with pending_updates == 1.
>
> I think the way for decay_load_missed() to get an early bail for
> *!load*, which the Dietmar's proposal did, is also good. And the
> peterz's proposal avoiding an unnecessary "add" operation is also
> good. Whatever..
>
>>
>> What I was thinking of doing however is:
>>
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4445,13 +4445,15 @@ static void __update_cpu_load(struct rq
>>
>> old_load = this_rq->cpu_load[i];
>> old_load = decay_load_missed(old_load, pending_updates - 1, i);
>> - old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
>> - /*
>> - * old_load can never be a negative value because a decayed
>> - * tickless_load cannot be greater than the original
>> - * tickless_load.
>> - */
>> - old_load += tickless_load;
>> + if (tickless_load) {
>
> And additionally, in this approach, why don't you do like,
>
> if (tickless_load || pending_updates - 1)
>
>> + old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
>> + /*
>> + * old_load can never be a negative value because a
>> + * decayed tickless_load cannot be greater than the
>> + * original tickless_load.
>> + */
>> + old_load += tickless_load;
>> + }
>> new_load = this_load;
>> /*
>> * Round up the averaging division if load is increasing. This
>>
>>
>> Since regardless of the pending_updates, none of that makes sense if
>> !tickless_load.
>
> None of that makes sense if !(pending_updates - 1), too. In that case,
> it becomes,
>
> old_load -= tickless_load;
> old_load += tickless_load;
>

tickless_load is potentially set to != 0 (rq->cpu_load[0]) in case
__update_cpu_load() is called by:

tick_nohz_full_update_tick()->tick_nohz_restart_sched_tick()->
update_cpu_load_nohz() [CONFIG_NO_HZ_FULL=y]

scheduler_tick()->update_cpu_load_active()

The former can call with pending_updates=1 and the latter always calls
w/ pending_updates=1 which makes it impossible to set tickless_load
correctly in __update_cpu_load(). I guess an enum indicating the caller
(update_cpu_load_active() update_cpu_load_nohz() or
update_idle_cpu_load) would help if we want to do this super correctly.

I don't have a strong preference whether we do 'if (tickless_load)' or
let decay_load_missed() bail when load = 0 (latter is not really
necessary because decay_load_missed() can handle load=0 already.
But if we do 'if (tickless_load)' why don't we also do 'if(old_load)'?
(old_load can be 0 as well)

Anyway, the patch fixes the unsigned underflow issue I saw for the
cpu_load[] values in NO_HZ_FULL mode.

Maybe you could fix the __update_cpu_load header as well:

'@active: !0 for NOHZ_FULL' (not really true, also when called by
update_cpu_load_active())
s/decay_load_misses()/decay_load_missed()
s/@active paramter/@active parameter

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Tested-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>