Re: [PATCH v2] sched/fair: reduce long-tail newly idle balance cost

From: Vincent Guittot
Date: Tue Mar 23 2021 - 09:46:28 EST


Hi Aurey,

On Tue, 16 Mar 2021 at 05:27, Li, Aubrey <aubrey.li@xxxxxxxxxxxxxxx> wrote:
>
> On 2021/2/24 16:15, Aubrey Li wrote:
> > A long-tail load balance cost is observed on the newly idle path,
> > this is caused by a race window between the first nr_running check
> > of the busiest runqueue and its nr_running recheck in detach_tasks.
> >
> > Before the busiest runqueue is locked, the tasks on the busiest
> > runqueue could be pulled by other CPUs and nr_running of the busiest
> > runqueu becomes 1 or even 0 if the running task becomes idle, this
> > causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
> > load_balance redo at the same sched_domain level.
> >
> > In order to find the new busiest sched_group and CPU, load balance will
> > recompute and update the various load statistics, which eventually leads
> > to the long-tail load balance cost.
> >
> > This patch clears LBF_ALL_PINNED flag for this race condition, and hence
> > reduces the long-tail cost of newly idle balance.
>
> Ping...

Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>

>
> >
> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> > Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > Signed-off-by: Aubrey Li <aubrey.li@xxxxxxxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 04a3ce2..5c67804 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7675,6 +7675,15 @@ static int detach_tasks(struct lb_env *env)
> >
> > lockdep_assert_held(&env->src_rq->lock);
> >
> > + /*
> > + * Source run queue has been emptied by another CPU, clear
> > + * LBF_ALL_PINNED flag as we will not test any task.
> > + */
> > + if (env->src_rq->nr_running <= 1) {
> > + env->flags &= ~LBF_ALL_PINNED;
> > + return 0;
> > + }
> > +
> > if (env->imbalance <= 0)
> > return 0;
> >
> >
>