Re: [PATCH 2/3 v2] sched/fair: don't set LBF_ALL_PINNED unnecessarily

From: Vincent Guittot
Date: Thu Jan 07 2021 - 11:03:49 EST


On Thu, 7 Jan 2021 at 16:08, Tao Zhou <ouwen210@xxxxxxxxxxx> wrote:
>
> Hi Vincent,
>
> On Thu, Jan 07, 2021 at 11:33:24AM +0100, Vincent Guittot wrote:
> > Setting LBF_ALL_PINNED during active load balance is only valid when there
> > is only 1 running task on the rq otherwise this ends up increasing the
> > balance interval whereas other tasks could migrate after the next interval
> > once they become cache-cold as an example.
> >
> > LBF_ALL_PINNED flag is now always set it by default. It is then cleared
> > when we find one task that can be pulled when calling detach_tasks() or
> > during active migration.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5428b8723e61..a3515dea1afc 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9626,6 +9626,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> > env.src_rq = busiest;
> >
> > ld_moved = 0;
> > + /* Clear this flag as soon as we find a pullable task */
> > + env.flags |= LBF_ALL_PINNED;
> > if (busiest->nr_running > 1) {
> > /*
> > * Attempt to move tasks. If find_busiest_group has found
> > @@ -9633,7 +9635,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> > * still unbalanced. ld_moved simply stays zero, so it is
> > * correctly treated as an imbalance.
> > */
> > - env.flags |= LBF_ALL_PINNED;
> > env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
> >
> > more_balance:
> > @@ -9759,10 +9760,12 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> > if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
> > raw_spin_unlock_irqrestore(&busiest->lock,
> > flags);
> > - env.flags |= LBF_ALL_PINNED;
>
> busiest->nr_running > 1, LBF_ALL_PINNED cleared but !ld_moved and get here.
> This is not consistent with the tip sched code because the original code
> from this path unconditionally set LBF_ALL_PINNED. But is this intentional
> to not increase balance interval and allow other tasks migrate not in the
> next balance interval.
>
> In v1, there was a condition here to allow that only one task running on rq
> can set LBF_ALL_PINNED. But in v2, when busiest->nr_running > 1, !ld_moved,
> LBF_ALL_PINNED is not cleared and can get here. Increase the balance interval.
> Not consist with v1. If I am not wrong, need a condition like:
>
> if (busiest->nr_running != 1 /* && env.flags & LBF_ALL_PINNED */)
> env.flags &= ~LBF_ALL_PINNED;

if (nr_running > 1) then LBF_ALL_PINNED can't be set when we reach the
active migration (if (!ld_moved) { ...) because we go to
out_all_pinned if LBF_ALL_PINNED is set and we tried all cpus of the
sched_group

>
> I hope this is not a noise to this new thread.
>
> Thanks,
> Tao
>
> > once they become cache-cold as an example
>
> > goto out_one_pinned;
> > }
> >
> > + /* Record that we found atleast one task that could run on this_cpu */
> > + env.flags &= ~LBF_ALL_PINNED;
> > +
> > /*
> > * ->active_balance synchronizes accesses to
> > * ->active_balance_work. Once set, it's cleared
> > --
> > 2.17.1
> >