On Fri, 19 Aug 2022 at 14:35, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
Hi Zhang,
On Fri, 19 Aug 2022 at 12:54, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:
On 2022/8/18 16:31, Vincent Guittot wrote:
Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
On 8/17/22 8:58 PM, Vincent Guittot Wrote:...
On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:
Argh yes, my brain is not yet back from vacationLoop will be terminated without LBF_NEED_BREAK if exceeds loop_max :)Yes, this is usually a corner case, but suppose that some non-idle tasks bounds to CPU 1-2env->loop_max adds a break but load_balance will continue with next
and idle tasks bounds to CPU 0-1, so CPU 1 may has many idle tasks and some non-idle
tasks while idle tasks on CPU 1 can not be pulled to CPU 2, when trigger load balance if
CPU 2 should pull some tasks from CPU 1, the bad result is idle tasks of CPU 1 cannot be
migrated and non-idle tasks also cannot be migrated in case of env->loop_max constraint.
tasks so it also tries to pull your non idle task at the end after
several breaks.
I have been confused by loop_max and loop_break being set to the same value 32
Zhang Song, Could you try the patch below ? If it works, I will prepare a
clean patch with all tags
sched/fair: make sure to try to detach at least one movable task
During load balance we try at most env->loop_max time to move a task. But
it can happen that the LRU tasks (ie tail of the cfs_tasks list) can't
be moved to dst_cpu because of affinity. In this case, loop in the list
until we found at least one.
Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
kernel/sched/fair.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da388657d5ac..02b7b808e186 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8052,8 +8052,12 @@ static int detach_tasks(struct lb_env *env)
p = list_last_entry(tasks, struct task_struct, se.group_node);
env->loop++;
- /* We've more or less seen every task there is, call it quits */
- if (env->loop > env->loop_max)
+ /*
+ * We've more or less seen every task there is, call it quits
+ * unless we haven't found any movable task yet.
+ */
+ if (env->loop > env->loop_max &&
+ !(env->flags & LBF_ALL_PINNED))
break;
/* take a breather every nr_migrate tasks */
@@ -10182,7 +10186,9 @@ static int load_balance(int this_cpu, struct rq *this_rq,
if (env.flags & LBF_NEED_BREAK) {
env.flags &= ~LBF_NEED_BREAK;
- goto more_balance;
+ /* Stop if we tried all running tasks */
+ if (env.loop < busiest->nr_running)
+ goto more_balance;
}
/*
--
2.17.1
Thanks for your reply.
I have tried your patch and run test compared with it, it seems that the
patch you provide makes no sense.
The test result is below(1000 idle tasks bounds to CPU 0-1 and 10 normal
tasks bounds to CPU 1-2):
=================================================================
Without patch:
6,777.37 msec cpu-clock # 1.355 CPUs utilized
20,812 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
0 page-faults # 0.000 K/sec
13,333,983,148 cycles # 1.967 GHz
6,457,930,305 instructions # 0.48 insn per cycle
2,125,644,649 branches # 313.639 M/sec
1,690,587 branch-misses # 0.08% of all
branches
5.001931983 seconds time elapsed
With your patch:
6,791.46 msec cpu-clock # 1.358 CPUs utilized
20,996 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
0 page-faults # 0.000 K/sec
13,467,573,052 cycles # 1.983 GHz
6,516,989,062 instructions # 0.48 insn per cycle
2,145,139,220 branches # 315.858 M/sec
1,751,454 branch-misses # 0.08% of all
branches
5.002274267 seconds time elapsed
With my patch:
7,495.14 msec cpu-clock # 1.499 CPUs utilized
23,176 context-switches # 0.003 M/sec
309 cpu-migrations # 0.041 K/sec
0 page-faults # 0.000 K/sec
14,849,083,489 cycles # 1.981 GHz
7,180,832,268 instructions # 0.48 insn per cycle
2,363,300,644 branches # 315.311 M/sec
1,964,169 branch-misses # 0.08% of all
branches
5.001713352 seconds time elapsed
===============================================================
Obviously, when your patch is applied, the cpu-migrations of normal
tasks is still 0 and the
CPU ulization of normal tasks have no improvement compared with no patch
applied.
When apply my patch, the cpu-migrations and CPU ulization of normal
tasks can both improve.
I cannot explain the result with your patch, you also can test it by
yourself.
Do you have more details about the test that your are running ?
Do cpu0-2 share their cache ?
Which kingd of task are the normal and idle tasks ? always running tasks ?
I'm going to try to reproduce your problem locally
Some details of your UC are missing. I have tried to reproduce your
example above:
1000 idle tasks bounds to CPU 0-1 and 10 normal tasks bounds to CPU 1-2
Let assume that for any reason, the 10 normal tasks wake up on CPU 1.
Then, the thousand of idle tasks are moved to CPU0 by load balance and
only normal tasks stay on CPU1. Then load balance will move some
normal tasks to CPU2.
My only way to reproduce something similar to your example, is to pin
the 1000 idle tasks on CPU1 so they can't move to CPU0. Then I can see
that load balance reaches loop_max limit and gets hard time moving
normal tasks on CPU2. But in this later case, my patch helps to move
normal tasks on CPU2. Something is missing in the description of your
UC.
Sidenote, I have the same kind of problem with 1000 normal task with
low priority so it's not a matter of idle vs normal tasks
Regards,
Vincent