Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval

From: Vincent Guittot
Date: Tue Dec 18 2018 - 04:32:53 EST


Hi Valentin,

On Mon, 17 Dec 2018 at 17:56, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> Hi Vincent,
>
> About time I had a look at this one...
>
> On 14/12/2018 16:01, Vincent Guittot wrote:
> > In case of active balance, we increase the balance interval to cover
> > pinned tasks cases not covered by all_pinned logic.
>
> AFAIUI the balance increase is there to have plenty of time to
> stop the task before going through another load_balance().
>
> Seeing as there is a cpus_allowed check that leads to
>
> env.flags |= LBF_ALL_PINNED;
> goto out_one_pinned;
>
> in the active balancing part of load_balance(), the condition you're
> changing should never be hit when we have pinned tasks. So you may want
> to rephrase that bit.

In this asym packing case, It has nothing to do with pinned tasks and
that's the root cause of the problem:
the active balance triggered by asym packing is wrongly assumed to be
an active balance due to pinned task(s) and the load balance interval
is increased without any reason

>
> > Neverthless, the active
> > migration triggered by asym packing should be treated as the normal
> > unbalanced case and reset the interval to default value otherwise active
> > migration for asym_packing can be easily delayed for hundreds of ms
> > because of this all_pinned detection mecanism.
>
> Mmm so it's not exactly clear why we need this change. If we double the
> interval because of a pinned task we wanted to active balance, well it's
> just regular pinned task issues and we can't do much about it.

As explained above, it's not a pinned task case

>
> The only scenario I can think of is if you had a workload where you wanted
> to do an active balance in several successive load_balance(), in which case
> you would keep increasing the interval even though you do migrate a task
> every time (which would harm every subsequent active_balance).
>
> In that case every active_balance "user" (pressured CPU, misfit) is
> affected, so maybe what we want is something like this:
>
> -----8<-----
> @@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> sd->balance_interval = sd->min_interval;
> } else {
> /*
> - * If we've begun active balancing, start to back off. This
> - * case may not be covered by the all_pinned logic if there
> - * is only 1 task on the busy runqueue (because we don't call
> - * detach_tasks).
> + * If we've begun active balancing, start to back off.
> + * Don't increase too much in case we have more active balances
> + * coming up.
> */
> - if (sd->balance_interval < sd->max_interval)
> - sd->balance_interval *= 2;
> + sd->balance_interval = 2 * sd->min_interval;
> }
>
> goto out;
> ----->8-----
>
> Maybe we want a larger threshold - truth be told, it all depends on how
> long the cpu stopper can take and if that delay increase is still relevant
> nowadays.

hmm the increase of balance interval is not linked to cpu stopper but
to increase the load balance interval when we know that there is no
possible load balance to perform

Regards,
Vincent
>
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 27 +++++++++++++++------------
> > 1 file changed, 15 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 9591e7a..487287e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > */
> > #define MAX_PINNED_INTERVAL 512
> >
> > +static inline bool
> > +asym_active_balance(struct lb_env *env)
> > +{
> > + /*
> > + * ASYM_PACKING needs to force migrate tasks from busy but
> > + * lower priority CPUs in order to pack all tasks in the
> > + * highest priority CPUs.
> > + */
> > + return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> > + sched_asym_prefer(env->dst_cpu, env->src_cpu);
> > +}
> > +
> > static int need_active_balance(struct lb_env *env)
> > {
> > struct sched_domain *sd = env->sd;
> >
> > - if (env->idle != CPU_NOT_IDLE) {
> > -
> > - /*
> > - * ASYM_PACKING needs to force migrate tasks from busy but
> > - * lower priority CPUs in order to pack all tasks in the
> > - * highest priority CPUs.
> > - */
> > - if ((sd->flags & SD_ASYM_PACKING) &&
> > - sched_asym_prefer(env->dst_cpu, env->src_cpu))
> > - return 1;
> > - }
> > + if (asym_active_balance(env))
> > + return 1;
> >
> > /*
> > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
> > @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> > } else
> > sd->nr_balance_failed = 0;
> >
> > - if (likely(!active_balance)) {
> > + if (likely(!active_balance) || asym_active_balance(&env)) {
>
> AFAICT this makes us reset the interval whenever we do an asym packing
> active balance (if the task is pinned we would goto out_one_pinned).
> This goes against the logic I had understood so far and that I explained
> higher up.
>
> > /* We were unbalanced, so reset the balancing interval */
> > sd->balance_interval = sd->min_interval;
> > } else {
> >