Re: [PATCH v2 2/2] sched/fair: Trigger nohz.next_balance updates when a CPU goes NOHZ-idle

From: Valentin Schneider
Date: Mon Jul 19 2021 - 14:12:57 EST


On 19/07/21 17:24, Dietmar Eggemann wrote:
> On 19/07/2021 12:31, Valentin Schneider wrote:
>
> [...]
>
>> @@ -10351,6 +10352,9 @@ static void nohz_balancer_kick(struct rq *rq)
>> unlock:
>> rcu_read_unlock();
>> out:
>> + if (READ_ONCE(nohz.needs_update))
>> + flags |= NOHZ_NEXT_KICK;
>> +
>
> Since NOHZ_NEXT_KICK is part of NOHZ_KICK_MASK, some conditions above
> will already set it in flags. Is this intended?

So if no kick would be issued (e.g. flags == 0 because nohz.next_balance is
later in the future), then this does the right thing and issues a
NOHZ_NEXT_KICK one.

However you're right to point out that even if nohz.needs_update is false,
we can set NOHZ_NEXT_KICK into the ilb rq's NOHZ flags due to it being
included in NOHZ_KICK_MASK, which I think is a mistake. Looking at it now,
it shouldn't be part of NOHZ_KICK_MASK.

>
>> if (flags)
>> kick_ilb(flags);
>> }
>> @@ -10447,12 +10451,13 @@ void nohz_balance_enter_idle(int cpu)
>> /*
>> * Ensures that if nohz_idle_balance() fails to observe our
>> * @idle_cpus_mask store, it must observe the @has_blocked
>> - * store.
>> + * and @needs_update stores.
>> */
>> smp_mb__after_atomic();
>>
>> set_cpu_sd_state_idle(cpu);
>>
>> + WRITE_ONCE(nohz.needs_update, 1);
>> out:
>> /*
>> * Each time a cpu enter idle, we assume that it has blocked load and
>> @@ -10501,13 +10506,17 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
>
> function header would need update to incorporate the new 'update
> nohz.next_balance' functionality. It only talks about 'update of blocked
> load' and 'complete load balance' so far.
>
>> /*
>> * We assume there will be no idle load after this update and clear
>> * the has_blocked flag. If a cpu enters idle in the mean time, it will
>> - * set the has_blocked flag and trig another update of idle load.
>> + * set the has_blocked flag and trigger another update of idle load.
>> * Because a cpu that becomes idle, is added to idle_cpus_mask before
>> * setting the flag, we are sure to not clear the state and not
>> * check the load of an idle cpu.
>> + *
>> + * Same applies to idle_cpus_mask vs needs_update.
>> */
>> if (flags & NOHZ_STATS_KICK)
>> WRITE_ONCE(nohz.has_blocked, 0);
>> + if (flags & NOHZ_NEXT_KICK)
>> + WRITE_ONCE(nohz.needs_update, 0);
>>
>> /*
>> * Ensures that if we miss the CPU, we must see the has_blocked
>> @@ -10531,6 +10540,8 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
>> if (need_resched()) {
>> if (flags & NOHZ_STATS_KICK)
>> has_blocked_load = true;
>
> This looks weird now? 'has_blocked_load = true' vs
> 'WRITE_ONCE(nohz.needs_update, 1)'.
>

Well, has_blocked_load lets us factorize the nohz.has_blocked write
(one is needed either when aborting or at the tail of the cpumask
iteration), whereas there is just a single write for nohz.needs_update
(when aborting).

>> + if (flags & NOHZ_NEXT_KICK)
>> + WRITE_ONCE(nohz.needs_update, 1);
>> goto abort;
>> }
>>
>>