Re: [PATCH 2/3] sched/fair: don't set LBF_ALL_PINNED unnecessarily

From: Vincent Guittot
Date: Wed Jan 06 2021 - 11:05:15 EST


On Wed, 6 Jan 2021 at 16:13, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> On 06/01/21 14:34, Vincent Guittot wrote:
> > Setting LBF_ALL_PINNED during active load balance is only valid when there
> > is only 1 running task on the rq otherwise this ends up increasing the
> > balance interval whereas other tasks could migrate after the next interval
> > once they become cache-cold as an example.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5428b8723e61..69a455113b10 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9759,7 +9759,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> > if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
> > raw_spin_unlock_irqrestore(&busiest->lock,
> > flags);
> > - env.flags |= LBF_ALL_PINNED;
> > + if (busiest->nr_running == 1)
> > + env.flags |= LBF_ALL_PINNED;
>
> So LBF_ALL_PINNED *can* be set if busiest->nr_running > 1, because
> before we get there we have:
>
> if (nr_running > 1) {
> env.flags |= LBF_ALL_PINNED;
> detach_tasks(&env); // Removes LBF_ALL_PINNED if > 0 tasks can be pulled
> ...
> }
>
> What about following the logic used by detach_tasks() and only clear the
> flag? Say something like the below. if nr_running > 1, then we'll have
> gone through detach_tasks() and will have cleared the flag (if
> possible).
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04a3ce20da67..211c86ba3f5b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9623,6 +9623,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> env.src_rq = busiest;
>
> ld_moved = 0;
> + /* Clear this as soon as we find a single pullable task */
> + env.flags |= LBF_ALL_PINNED;
> if (busiest->nr_running > 1) {
> /*
> * Attempt to move tasks. If find_busiest_group has found
> @@ -9630,7 +9632,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> * still unbalanced. ld_moved simply stays zero, so it is
> * correctly treated as an imbalance.
> */
> - env.flags |= LBF_ALL_PINNED;
> env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
>
> more_balance:
> @@ -9756,10 +9757,11 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
> raw_spin_unlock_irqrestore(&busiest->lock,
> flags);
> - env.flags |= LBF_ALL_PINNED;
> goto out_one_pinned;
> }
>
> + env.flags &= ~LBF_ALL_PINNED;

Yes, looks easier to read.
will do the change in the next version

> +
> /*
> * ->active_balance synchronizes accesses to
> * ->active_balance_work. Once set, it's cleared
> ---