[PATCH] SCHED: scatter nohz idle balance target cpus
From: Jianyong Wu
Date: Mon Mar 17 2025 - 23:00:35 EST
Currently, cpu selection logic for nohz idle balance lacks history info
that leads to cpu0 is always chosen if it's in nohz cpu mask. It's not
fair fot the tasks reside in numa node0. It's worse in the machine with
large cpu number, nohz idle balance may be very heavy.
To address this issue, adding a member to "nohz" to indicate who is
chosen last time and choose next for this round of nohz idle balance.
Signed-off-by: Jianyong Wu <wujianyong@xxxxxxxx>
---
kernel/sched/fair.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c798d2795243..ba6930c79e25 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7197,6 +7197,7 @@ static struct {
atomic_t nr_cpus;
int has_blocked; /* Idle CPUS has blocked load */
int needs_update; /* Newly idle CPUs need their next_balance collated */
+ int last_cpu; /* Last cpu chosen to do nohz idle balance */
unsigned long next_balance; /* in jiffy units */
unsigned long next_blocked; /* Next update of blocked load in jiffies */
} nohz ____cacheline_aligned;
@@ -12266,13 +12267,15 @@ static inline int find_new_ilb(void)
hk_mask = housekeeping_cpumask(HK_TYPE_KERNEL_NOISE);
- for_each_cpu_and(ilb_cpu, nohz.idle_cpus_mask, hk_mask) {
+ for_each_cpu_wrap(ilb_cpu, nohz.idle_cpus_mask, nohz.last_cpu + 1) {
- if (ilb_cpu == smp_processor_id())
+ if (ilb_cpu == smp_processor_id() || !cpumask_test_cpu(ilb_cpu, hk_mask))
continue;
- if (idle_cpu(ilb_cpu))
+ if (idle_cpu(ilb_cpu)) {
+ nohz.last_cpu = ilb_cpu;
return ilb_cpu;
+ }
}
return -1;
--
2.43.0