[PATCH 0/2] sched/fair: Reduce nohz_idle_balance CPU overhead on large systems
From: Imran Khan
Date: Tue Apr 21 2026 - 01:07:02 EST
On large systems (700+ CPUs, 350+ CPUs in root sched_domain/cpuset),
nohz idle balancing can consume significant amount of CPU due to two
independent problems.
First, due to large number of CPUs there is a very good chance of
nohz.next_balance always being same or very close to current jiffies,
causing nohz idle balance work to happen on almost each tick.
Second, find_new_ilb() uses for_each_cpu_and() to iterate idle_cpus_mask
from the lowest bit, so the lowest-numbered idle CPU in the cpuset bears
the full burden of nohz ILB work and most of the times it's the same CPU.
Again on large scale systems this work becomes significant and unfairly
consumes cycles of same CPU most of the times.
Patch 1 addresses the first issue by advancing nohz.next_balance based on
the number of idle CPUs and patch 2 addresses the second issue by
distributing the nohz ILB work across eligible idle CPUs.
Imran Khan (2):
sched/fair: scale nohz.next_balance according to number of idle CPUs.
sched/fair: distribute nohz ILB work across idle CPUs.
kernel/sched/fair.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
--
2.34.1