Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce interference from IDLE tasks

From: Vincent Guittot
Date: Fri Aug 19 2022 - 08:35:55 EST


Hi Zhang,

On Fri, 19 Aug 2022 at 12:54, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:
>
>
> On 2022/8/18 16:31, Vincent Guittot wrote:
> > Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
> >> On 8/17/22 8:58 PM, Vincent Guittot Wrote:
> >>> On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:
> >>>>
> > ...
> >
> >>>> Yes, this is usually a corner case, but suppose that some non-idle tasks bounds to CPU 1-2
> >>>>
> >>>> and idle tasks bounds to CPU 0-1, so CPU 1 may has many idle tasks and some non-idle
> >>>>
> >>>> tasks while idle tasks on CPU 1 can not be pulled to CPU 2, when trigger load balance if
> >>>>
> >>>> CPU 2 should pull some tasks from CPU 1, the bad result is idle tasks of CPU 1 cannot be
> >>>>
> >>>> migrated and non-idle tasks also cannot be migrated in case of env->loop_max constraint.
> >>> env->loop_max adds a break but load_balance will continue with next
> >>> tasks so it also tries to pull your non idle task at the end after
> >>> several breaks.
> >> Loop will be terminated without LBF_NEED_BREAK if exceeds loop_max :)
> > Argh yes, my brain is not yet back from vacation
> > I have been confused by loop_max and loop_break being set to the same value 32
> >
> > Zhang Song, Could you try the patch below ? If it works, I will prepare a
> > clean patch with all tags
> >
> >
> >
> > sched/fair: make sure to try to detach at least one movable task
> >
> > During load balance we try at most env->loop_max time to move a task. But
> > it can happen that the LRU tasks (ie tail of the cfs_tasks list) can't
> > be moved to dst_cpu because of affinity. In this case, loop in the list
> > until we found at least one.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index da388657d5ac..02b7b808e186 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8052,8 +8052,12 @@ static int detach_tasks(struct lb_env *env)
> > p = list_last_entry(tasks, struct task_struct, se.group_node);
> >
> > env->loop++;
> > - /* We've more or less seen every task there is, call it quits */
> > - if (env->loop > env->loop_max)
> > + /*
> > + * We've more or less seen every task there is, call it quits
> > + * unless we haven't found any movable task yet.
> > + */
> > + if (env->loop > env->loop_max &&
> > + !(env->flags & LBF_ALL_PINNED))
> > break;
> >
> > /* take a breather every nr_migrate tasks */
> > @@ -10182,7 +10186,9 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >
> > if (env.flags & LBF_NEED_BREAK) {
> > env.flags &= ~LBF_NEED_BREAK;
> > - goto more_balance;
> > + /* Stop if we tried all running tasks */
> > + if (env.loop < busiest->nr_running)
> > + goto more_balance;
> > }
> >
> > /*
> > --
> > 2.17.1
>
> Thanks for your reply.
> I have tried your patch and run test compared with it, it seems that the
> patch you provide makes no sense.
> The test result is below(1000 idle tasks bounds to CPU 0-1 and 10 normal
> tasks bounds to CPU 1-2):
>
> =================================================================
>
> Without patch:
>
>
> 6,777.37 msec cpu-clock # 1.355 CPUs utilized
> 20,812 context-switches # 0.003 M/sec
> 0 cpu-migrations # 0.000 K/sec
> 0 page-faults # 0.000 K/sec
> 13,333,983,148 cycles # 1.967 GHz
> 6,457,930,305 instructions # 0.48 insn per cycle
> 2,125,644,649 branches # 313.639 M/sec
> 1,690,587 branch-misses # 0.08% of all
> branches
> 5.001931983 seconds time elapsed
>
> With your patch:
>
>
> 6,791.46 msec cpu-clock # 1.358 CPUs utilized
> 20,996 context-switches # 0.003 M/sec
> 0 cpu-migrations # 0.000 K/sec
> 0 page-faults # 0.000 K/sec
> 13,467,573,052 cycles # 1.983 GHz
> 6,516,989,062 instructions # 0.48 insn per cycle
> 2,145,139,220 branches # 315.858 M/sec
> 1,751,454 branch-misses # 0.08% of all
> branches
>
> 5.002274267 seconds time elapsed
>
> With my patch:
>
>
> 7,495.14 msec cpu-clock # 1.499 CPUs utilized
> 23,176 context-switches # 0.003 M/sec
> 309 cpu-migrations # 0.041 K/sec
> 0 page-faults # 0.000 K/sec
> 14,849,083,489 cycles # 1.981 GHz
> 7,180,832,268 instructions # 0.48 insn per cycle
> 2,363,300,644 branches # 315.311 M/sec
> 1,964,169 branch-misses # 0.08% of all
> branches
>
> 5.001713352 seconds time elapsed
> ===============================================================
>
> Obviously, when your patch is applied, the cpu-migrations of normal
> tasks is still 0 and the
> CPU ulization of normal tasks have no improvement compared with no patch
> applied.
> When apply my patch, the cpu-migrations and CPU ulization of normal
> tasks can both improve.
> I cannot explain the result with your patch, you also can test it by
> yourself.

Do you have more details about the test that your are running ?

Do cpu0-2 share their cache ?
Which kingd of task are the normal and idle tasks ? always running tasks ?

I'm going to try to reproduce your problem locally

Regards,
Vincent

>
> Best,
> Zhang Song
>
> >
> >>>> This will cause non-idle tasks cannot achieve more CPU utilization.
> >>> Your problem is not linked to IDLE vs NORMAL tasks but to the large
> >>> number of pinned tasks that can't migrate on CPU2. You can end with
> >>> the same behavior without using IDLE tasks but only NORMAL tasks.
> >> I feel the same thing.
> >>
> >> Best,
> >> Abel
> > .