Re: [PATCH v5 1/3] sched: Stop nohz stats when decayed

From: Valentin Schneider
Date: Fri Feb 16 2018 - 14:24:00 EST


On 02/16/2018 05:02 PM, Vincent Guittot wrote:
> On 16 February 2018 at 13:53, Valentin Schneider
> <valentin.schneider@xxxxxxx> wrote:
>> On 02/14/2018 03:26 PM, Vincent Guittot wrote:
>>> Stopped the periodic update of blocked load when all idle CPUs have fully
>>> decayed. We introduce a new nohz.has_blocked that reflect if some idle
>>> CPUs has blocked load that have to be periodiccally updated. nohz.has_blocked
>>> is set everytime that a Idle CPU can have blocked load and it is then clear
>>> when no more blocked load has been detected during an update. We don't need
>>> atomic operation but only to make cure of the right ordering when updating
>>> nohz.idle_cpus_mask and nohz.has_blocked.
>>>
>>> Suggested-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>>> ---
>>> kernel/sched/fair.c | 122 ++++++++++++++++++++++++++++++++++++++++++---------
>>> kernel/sched/sched.h | 1 +
>>> 2 files changed, 102 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 7af1fa9..5a6835e 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>>
>>> [...]
>>>
>>> -static void update_nohz_stats(struct rq *rq)
>>> +static bool update_nohz_stats(struct rq *rq)
>>> {
>>> #ifdef CONFIG_NO_HZ_COMMON
>>> unsigned int cpu = rq->cpu;
>>>
>>> + if (!rq->has_blocked_load)
>>> + return false;
>>> +
>>> if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
>>> - return;
>>> + return false;
>>>
>>> if (!time_after(jiffies, rq->last_blocked_load_update_tick))
>>> - return;
>>> + return true;
>>>
>>> update_blocked_averages(cpu);
>>> +
>>> + return rq->has_blocked_load;
>>> +#else
>>> + return false;
>>> #endif
>>> }
>>>
>>
>> (Wrongly thought that this bit was in a different patch, comment should have
>> been squashed in previous reply...)
>>
>> I've been thinking about this, and it's a messy one if we want to
>> skip CPUs in idle_balance() / clear the nohz.has_blocked_flag.
>>
>> AFAICT, the rq->has_blocked_load flag can be wrongly cleared: if we're
>> calling update_nohz_stats() for CPU A, but CPU A got out/in of
>> idle really quickly in that same timeframe, I'm not sure you can guarantee
>> the clearing of rq->has_blocked_load done in update_blocked_averages() will
>> always end up in memory before the setting of the flag in
>> nohz_balance_enter_idle().
>
> Not sure it's a problem in this case because the clear done in
> update_blocked_averages() only happens if there is no load on the rq
> and new load can't be added in the mean time
>

You're right, and that's why there's that comment:
>> /*
>> * Can be set safely without rq->lock held
>> * If a clear happens, it will have evaluated last additions because
>> * rq->lock is held during the check and the clear
>> */
>> rq->has_blocked_load = 1;

Even though it's clearly written there my brain wouldn't process the fact
that the flag is cleared with the rq lock held. So yeah, we can't wrongly
clear rq->has_blocked_load. The only mishap that can happen is that it is
re-raised even though we just went though an update_nohz_stats(), which would
lead to a useless stats update in the future, but that's not as bad.