Re: [PATCH v5 1/3] sched: Stop nohz stats when decayed

From: Valentin Schneider
Date: Thu Feb 22 2018 - 05:05:04 EST


On 02/22/2018 08:37 AM, Vincent Guittot wrote:
> On 21 February 2018 at 14:13, Valentin Schneider
> <valentin.schneider@xxxxxxx> wrote:
>> On 02/16/2018 01:44 PM, Vincent Guittot wrote:
>>> On 16 February 2018 at 13:13, Valentin Schneider
>>> <valentin.schneider@xxxxxxx> wrote:
>>>> On 02/14/2018 03:26 PM, Vincent Guittot wrote:
>>>>> Stopped the periodic update of blocked load when all idle CPUs have fully
>>>>> decayed. We introduce a new nohz.has_blocked that reflect if some idle
>>>>> CPUs has blocked load that have to be periodiccally updated. nohz.has_blocked
>>>>> is set everytime that a Idle CPU can have blocked load and it is then clear
>>>>> when no more blocked load has been detected during an update. We don't need
>>>>> atomic operation but only to make cure of the right ordering when updating
>>>>> nohz.idle_cpus_mask and nohz.has_blocked.
>>>>>
>>>>> Suggested-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>>>>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>>>>> ---
>>>>> kernel/sched/fair.c | 122 ++++++++++++++++++++++++++++++++++++++++++---------
>>>>> kernel/sched/sched.h | 1 +
>>>>> 2 files changed, 102 insertions(+), 21 deletions(-)
>>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 7af1fa9..5a6835e 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>>
>>>>> [...]
>>
>> I have one more question on that bit:
>>
>>
>> has_blocked_load |= update_nohz_stats(rq, true);
>>
>> /*
>> * If time for next balance is due,
>> * do the balance.
>> */
>> if (time_after_eq(jiffies, rq->next_balance)) {
>> struct rq_flags rf;
>>
>> rq_lock_irqsave(rq, &rf);
>> update_rq_clock(rq);
>> cpu_load_update_idle(rq);
>> rq_unlock_irqrestore(rq, &rf);
>>
>> if (flags & NOHZ_BALANCE_KICK)
>> rebalance_domains(rq, CPU_IDLE);
>> }
>>
>> if (time_after(next_balance, rq->next_balance)) {
>> next_balance = rq->next_balance;
>> update_next_balance = 1;
>> }
>>
>>
>> Now that I think about it, shouldn't we always have a 'continue' after
>> the blocked load update if (flags & NOHZ_KICK_MASK) == NOHZ_STATS_KICK ?
>> AFAICT we don't want to push the next_balance forward, only the next_blocked.
>
> But we don't push next_balance forward. It just get the shortest
> next_balance and update nohz.next_balance exactly like what is done in
> full idle load balance
>

Sorry, that was a poor choice of words - I probably should've just gone with
"update". What I meant by that is that if we have
(flags & NOHZ_KICK_MASK) == NOHZ_STATS_KICK
then we're not going to do the load balance.

Then, in this case, I thought that we should not be going through any
condition that uses nohz.next_balance (since we're not doing any balancing).
Arguably *updating* nohz.next_balance still makes sense in this scenario.

In short, my comment was mostly about "cleanly" separating stats update vs
load balance.

>> That would also take care of not doing the load balance.
>>>>
>>>> /*
>>>> * This cpu doesn't have any remaining blocked load, skip it.
>>>> * It's sane to do this because this flag is raised in
>>>> * nohz_balance_enter_idle()
>>>> */
>>>> if ((flags & NOHZ_KICK_MASK) == NOHZ_STATS_KICK &&
>>>> !rq->has_blocked_load)
>>>> continue;
>
> Then, it's worth keeping the call to cpu_load_update_idle(rq); which
> update the cpu_load[] array which is still used at some level
>

Is that something we would want to have in update_nohz_stats() to also
cover the idle_balance -> load_balance update scenario ?
>From a quick glance I would've said it shouldn't be needed since the CPU doing
the updates wouldn't have been nohz previously, but we're currently calling
it when going through nohz_newidle_balance() so I might have gotten that wrong.

>>>>
>>>>> + update_blocked_averages(rq->cpu);
>>>>> + has_blocked_load |= rq->has_blocked_load;
>>>>> +