Re: [PATCH v3 1/7] sched/fair: Ignore percpu threads for imbalance pulls

From: Chandra Sekhar Lingutla
Date: Tue Mar 16 2021 - 12:04:37 EST


Hi Dietmar,

On 3/16/2021 9:19 PM, Dietmar Eggemann wrote:
On 11/03/2021 13:05, Valentin Schneider wrote:
From: Lingutla Chandrasekhar <clingutla@xxxxxxxxxxxxxx>

In load balancing, when balancing group is unable to pull task
due to ->cpus_ptr constraints from busy group, then it sets
LBF_SOME_PINNED to lb env flags, as a consequence, sgc->imbalance
is set for its parent domain level. which makes the group
classified as imbalance to get help from another balancing cpu.

Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and
Does it have to be a big.LITTLE system? I assume this issue also happens
on an SMP system.

Yah, issue can happen on SMP system as well.  I will let Valentin update the commit text on
his next version of this series.

CPUs 2-3 as Bigs with below scenario:
- CPU0 doing newly_idle balancing
- CPU1 running percpu kworker and RT task (small tasks)
What's the role of the small RT task here in the story?
This is to satisfy 'busiest->nr_running > 1' checks.
- CPU2 running 2 big tasks
- CPU3 running 1 medium task

While CPU0 is doing newly_idle load balance at MC level, it fails to
pull percpu kworker from CPU1 and sets LBF_SOME_PINNED to lb env flag
and set sgc->imbalance at DIE level domain. As LBF_ALL_PINNED not cleared,
it tries to redo the balancing by clearing CPU1 in env cpus, but it don't
find other busiest_group, so CPU0 stops balacing at MC level without
clearing 'sgc->imbalance' and restart the load balacing at DIE level.

And CPU0 (balancing cpu) finds LITTLE's group as busiest_group with group
type as imbalance, and Bigs that classified the level below imbalance type
would be ignored to pick as busiest, and the balancing would be aborted
without pulling any tasks (by the time, CPU1 might not have running tasks).

It is suboptimal decision to classify the group as imbalance due to
percpu threads. So don't use LBF_SOME_PINNED for per cpu threads.
This sentence mentioned per-cpu threads (and so does the patch name) but
the implementation (only) deals with per-cpu kernel threads. IMHO, it
would be good to align this.
I will let Valentin update this on next version.
Signed-off-by: Lingutla Chandrasekhar <clingutla@xxxxxxxxxxxxxx>
[Use kthread_is_per_cpu() rather than p->nr_cpus_allowed]
Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2e2ab1e00ef9..83aea97fbf22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7565,6 +7565,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
return 0;
+ /* Disregard pcpu kthreads; they are where they need to be. */
+ if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
+ return 0;
+
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
int cpu;