[RFC PATCH] sched/fair: Fix impossible migrate_util scenario in load balance

From: Qais Yousef
Date: Sat Jul 15 2023 - 21:42:39 EST


We've seen cases while running geekbench that an idle little core never
pulls a task from a bigger overloaded cluster for 100s of ms and
sometimes over a second.

It turned out that the load balance identifies this as a migrate_util
type since the local group (little cluster) has a spare capacity and
will try to pull a task. But the little cluster capacity is very small
nowadays (around 200 or less) and if two busy tasks are stuck on a mid
core which has a capacity of over 700, this means the util of each tasks
will be around 350+ range. Which is always bigger than the spare
capacity of the little group with a single idle core.

When trying to detach_tasks() we bail out then because of the comparison
of:

if (util > env->imbalance)
goto next;

In calculate_imbalance() we convert a migrate_util into migrate_task
type if the CPU trying to do the pull is idle. But we only do this if
env->imbalance is 0; which I can't understand. AFAICT env->imbalance
contains the local group's spare capacity. If it is 0, this means it's
fully busy.

Removing this condition fixes the problem, but since I can't fully
understand why it checks for 0, sending this as RFC. It could be a typo
and meant to check for

env->imbalance != 0

instead?

Signed-off-by: Qais Yousef (Google) <qyousef@xxxxxxxxxxx>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a80a73909dc2..682d9d6a8691 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10288,7 +10288,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
* waiting task in this overloaded busiest group. Let's
* try to pull it.
*/
- if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) {
+ if (env->idle != CPU_NOT_IDLE) {
env->migration_type = migrate_task;
env->imbalance = 1;
}
--
2.25.1