Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance

From: Vincent Guittot
Date: Wed Sep 26 2018 - 04:13:42 EST


On Tue, 25 Sep 2018 at 19:38, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> When load_balance() fails to move some load because of task affinity,
> we end up increasing sd->balance_interval to delay the next periodic
> balance in the hopes that next time we look, that annoying pinned
> task(s) will be gone.

It's not about hope but to minimize useless periodic load balance as
there is no possibility to balance anything

>
> However, idle_balance() pays no attention to sd->balance_interval, yet
> it will still lead to an increase in balance_interval in case of
> pinned tasks.

I would say that this makes sense as there is no way to pull the
pinned task on this rq so running periodic load balance (busy or idle)
is a waste of time and resources

>
> If we're going through several newidle balances (e.g. we have a
> periodic task), this can lead to a huge increase of the

This only happen when overload is set which means that there are tasks
on another CPU but this one can't pull one of them.

Can you give us details about the use case that you care about ?

Also, If the interval reaches a value that becomes a problem for you
why don't you reduce the max_interval with sysfs ? This max interval
can be reach in several other occasions

> balance_interval in a very small amount of time.
>
> To prevent that, don't increase the balance interval when going
> through a newidle balance.
>
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> ---
> kernel/sched/fair.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6bd142d..620910d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8782,13 +8782,22 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> sd->nr_balance_failed = 0;
>
> out_one_pinned:
> + ld_moved = 0;
> +
> + /*
> + * idle_balance() disregards balance intervals, so we could repeatedly
> + * reach this code, which would lead to balance_interval skyrocketting
> + * in a short amount of time. Skip the balance_interval increase logic
> + * to avoid that.
> + */
> + if (env.idle == CPU_NEWLY_IDLE)
> + goto out;
> +
> /* tune up the balancing interval */
> - if (((env.flags & LBF_ALL_PINNED) &&
> - sd->balance_interval < MAX_PINNED_INTERVAL) ||
> - (sd->balance_interval < sd->max_interval))
> + if ((env.flags & LBF_ALL_PINNED &&
> + sd->balance_interval < MAX_PINNED_INTERVAL) ||
> + (sd->balance_interval < sd->max_interval))

This looks like a code cleanup that is not related to the subject

> sd->balance_interval *= 2;
> -
> - ld_moved = 0;
> out:
> return ld_moved;
> }
> --
> 2.7.4
>