Re: [PATCH v2 1/2] sched: force update of blocked load of idle cpus

From: Vincent Guittot
Date: Thu Dec 21 2017 - 02:59:46 EST


Le Wednesday 20 Dec 2017 à 16:01:10 (+0100), Peter Zijlstra a écrit :
> On Wed, Dec 20, 2017 at 03:23:01PM +0100, Vincent Guittot wrote:
> > On 20 December 2017 at 15:03, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > On Fri, Dec 01, 2017 at 06:01:56PM +0000, Brendan Jackman wrote:
> > >> @@ -8955,8 +8964,20 @@ static void nohz_balancer_kick(void)
> > >> if (ilb_cpu >= nr_cpu_ids)
> > >> return;
> > >>
> > >> - if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu)))
> > >> + if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu))) {
> > >> + if (!only_update) {
> > >> + /*
> > >> + * There's a pending/ongoing nohz kick/balance. If it's
> > >> + * just for stats, convert it to a proper load balance.
> > >> + */
> > >> + clear_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));
> > >> + }
> > >> return;
> > >> + }
> > >> +
> > >> + if (only_update)
> > >> + set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));
> > >> +
> > >> /*
> > >> * Use smp_send_reschedule() instead of resched_cpu().
> > >> * This way we generate a sched IPI on the target cpu which
> > >
> > > This looks racy.. if its not we don't need atomic ops, if it is but is
> > > still fine it needs a comment.
> >
> > NOHZ_STATS_KICK modification is protected by test_and_set_bit(NOHZ_BALANCE_KICK.
> > You're right that we don't need atomics ops and __set_bit() is enough
>
> Well, you shouldn't mix atomic and non-atomic ops to the same word,
> that's asking for trouble.

In fact, the atomic operation is needed, I forgot that set/clear of NOHZ_TICK_STOPPED
can happen simultaneously on the ilb_cpu so we must keep
set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));

As comment we can add:
/*
* Update of NOHZ_STATS_KICK itself is protected by test_and_set_biti but
* clear/set of NOHZ_TICK_STOPPED can happen simultaneously so we need
* atomic operation
*/

>
> But why don't you do something like:
>
> nohz_kick()
>
> flags = NOHZ_STAT;
> if (!only_update)
> flags |= NOHZ_BALANCE;
>
> atomic_long_or(flags, &nohz_cpu(cpu));
>
>
> nohz_idle_balance()
>
> unsigned long do_flags = atomic_long_fetch_andnot(NOHZ_BALANCE|NOHZ_STAT, &nohz_flags(cpu));
>
> if (do_flags & NOHZ_STAT)
> update_blocked_stuff();
>
> if (do_flags & NOHZ_BALANCE)
> rebalance_domains();
>
> That way its far more readable.
>