Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval

From: Vincent Guittot
Date: Wed Dec 19 2018 - 08:29:25 EST


On Wed, 19 Dec 2018 at 12:16, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> On 19/12/2018 08:27, Vincent Guittot wrote:
> [...]
> >> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top
> >> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find
> >> at least one task we could pull, even if we don't pull it because of
> >> other reasons in can_migrate_task() (e.g. cache hotness).
> >>
> >> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't
> >> increase the balance_interval, which is what we would want to maintain.
> >
> > But there are several other UC to do active migration and increase the interval
> > like all except running tasks are pinned
> >
>
> My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases
> we care about, although the one you're mentioning is the only one I can
> think of. In that case LBF_ALL_PINNED would never be cleared, so when we do
> the active balance we'd know it's because all other tasks were pinned so
> we should probably increase the interval (see last snippet I sent).

There are probably several other UC than the one mentioned below as
tasks can be discarded for several reasons.
So instead of changing for all UC by default, i would prefer only
change for those for which we know it's safe

>
> [...]
> >> So that's all the need_active_balance() cases except the last
> >> sd->nr_balance_failed one. I'd argue this could also be counted as a
> >> "good" reason to active balance which shouldn't lead to a balance_interval
> >> increase. Plus, it keeps to the logic of increasing the balance_interval
> >> only when task affinity gets in the way.
> >
> > But this is some kind of affinity, the hotness is a way for the
> > scheduler to temporarily pinned the task on a cpu to take advantage of
> > cache hotness.
> >
> > I would prefer to be conservative and only reset the interval when we
> > are sure that active load balance is really what we want to do.
> > Asym packing is one, we can add the misfit case and the move task on
> > cpu with more available capacity as well. For the other case, it's
> > less obvious and I would prefer to keep current behavior
> >
>
> Mmm ok so this one is kinda about semantics on what do we really consider
> as "pinning".
>
> If we look at the regular load_balance() path, if all tasks are cache hot
> (so we clear LBF_ALL_PINNED but don't pass can_migrate_task()) we won't
> increase the balance_interval. Actually, if we have !active_balance we'll
> reset it.
>
> Seeing as the duration of a task's cache hotness (default .5ms) is small
> compared to any balance_interval (1ms * sd_weight), IMO it would make sense
> to reset the interval for all active balance cases. On top of that, we
> would keep to the current logic of increasing the balance_interval *only*
> when task->cpus_allowed gets in the way.