Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval

From: Vincent Guittot
Date: Wed Dec 19 2018 - 03:27:43 EST


On Tue, 18 Dec 2018 at 15:09, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> On 18/12/2018 13:23, Vincent Guittot wrote:
> [...]
> >> Ah, I think I get it: you're saying that this balance_interval increase
> >> is done because it is always assumed we do an active balance with
> >> busiest->nr_running > 1 && pinned tasks, and that all that is left to
> >> migrate after the active_balance is pinned tasks that we can't actually
> >> migrate.
> >>
> >> Maybe that's what we should target (as always, totally untested):
> >>
> >> -----8<-----
> >> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >> } else
> >> sd->nr_balance_failed = 0;
> >>
> >> - if (likely(!active_balance)) {
> >> - /* We were unbalanced, so reset the balancing interval */
> >> - sd->balance_interval = sd->min_interval;
> >> - } else {
> >> + if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) {
> >
> > So it's not exactly LBF_ALL_PINNED but also some pinned tasks
> >
>
> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top
> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find
> at least one task we could pull, even if we don't pull it because of
> other reasons in can_migrate_task() (e.g. cache hotness).
>
> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't
> increase the balance_interval, which is what we would want to maintain.

But there are several other UC to do active migration and increase the interval
like all except running tasks are pinned

>
> > but IIUC, you would like to extend the reset of balance interval to
> > all the "good" reasons to trig active load balance
>
> Right
>
> > In fact the condition used in need_active_balance except the last
> > remaining one. Good reasons are:
> > - asym packing
> > - /*
> > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
> > * It's worth migrating the task if the src_cpu's capacity is reduced
> > * because of other sched_class or IRQs if more capacity stays
> > * available on dst_cpu.
> > */
> > - misfit task
> >
> > I can extend the test for these conditions
>
> So that's all the need_active_balance() cases except the last
> sd->nr_balance_failed one. I'd argue this could also be counted as a
> "good" reason to active balance which shouldn't lead to a balance_interval
> increase. Plus, it keeps to the logic of increasing the balance_interval
> only when task affinity gets in the way.

But this is some kind of affinity, the hotness is a way for the
scheduler to temporarily pinned the task on a cpu to take advantage of
cache hotness.

I would prefer to be conservative and only reset the interval when we
are sure that active load balance is really what we want to do.
Asym packing is one, we can add the misfit case and the move task on
cpu with more available capacity as well. For the other case, it's
less obvious and I would prefer to keep current behavior