Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval

From: Vincent Guittot
Date: Tue Dec 18 2018 - 08:23:50 EST


On Tue, 18 Dec 2018 at 12:46, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> On 18/12/2018 09:32, Vincent Guittot wrote:
> [...]
> > In this asym packing case, It has nothing to do with pinned tasks and
> > that's the root cause of the problem:
> > the active balance triggered by asym packing is wrongly assumed to be
> > an active balance due to pinned task(s) and the load balance interval
> > is increased without any reason
> >
> [...]> hmm the increase of balance interval is not linked to cpu stopper but
> > to increase the load balance interval when we know that there is no
> > possible load balance to perform
> >
> > Regards,
> > Vincent
>
> Ah, I think I get it: you're saying that this balance_interval increase
> is done because it is always assumed we do an active balance with
> busiest->nr_running > 1 && pinned tasks, and that all that is left to
> migrate after the active_balance is pinned tasks that we can't actually
> migrate.
>
> Maybe that's what we should target (as always, totally untested):
>
> -----8<-----
> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> } else
> sd->nr_balance_failed = 0;
>
> - if (likely(!active_balance)) {
> - /* We were unbalanced, so reset the balancing interval */
> - sd->balance_interval = sd->min_interval;
> - } else {
> + if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) {

So it's not exactly LBF_ALL_PINNED but also some pinned tasks

but IIUC, you would like to extend the reset of balance interval to
all the "good" reasons to trig active load balance
In fact the condition used in need_active_balance except the last
remaining one. Good reasons are:
- asym packing
- /*
* The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
* It's worth migrating the task if the src_cpu's capacity is reduced
* because of other sched_class or IRQs if more capacity stays
* available on dst_cpu.
*/
- misfit task

I can extend the test for these conditions


> /*
> - * If we've begun active balancing, start to back off. This
> - * case may not be covered by the all_pinned logic if there
> - * is only 1 task on the busy runqueue (because we don't call
> - * detach_tasks).
> + * We did an active balance as a last resort (all other tasks
> + * were pinned), start to back off.
> */
> if (sd->balance_interval < sd->max_interval)
> sd->balance_interval *= 2;
> + } else {
> + /* We were unbalanced, so reset the balancing interval */
> + sd->balance_interval = sd->min_interval;
> }
>
> goto out;
> ----->8-----
>
> can_migrate_task() called by detach_tasks() in the
> 'if (busiest->nr_running > 1)' block should clear that LBF_ALL_PINNED flag
> if there is any migratable task (even if we don't end up migrating it),
> and it's not set if (busiest->nr_running == 1), so that *should* work.
>
> I believe the argument that this applies to all kinds of active balances
> is still valid - this shouldn't be changed just for asym packing active
> balance.