Re: [PATCH v3 3/3] sched/fair: Remove nohz.nr_cpus and use weight of cpumask instead
From: Shrikanth Hegde
Date: Fri Jan 09 2026 - 10:19:34 EST
Hi Valentin. Thanks for going through.
On 1/9/26 8:14 PM, Valentin Schneider wrote:
On 07/01/26 12:21, Shrikanth Hegde wrote:
nohz.nr_cpus was observed as contended cacheline when running
enterprise workload on large systems.
Fundamental scalability challenge with nohz.idle_cpus_mask
and nohz.nr_cpus is the following:
(1) nohz_balancer_kick() observes (reads) nohz.nr_cpus
(or nohz.idle_cpu_mask) and nohz.has_blocked to see whether there's
any nohz balancing work to do, in every scheduler tick.
(2) nohz_balance_enter_idle() and nohz_balance_exit_idle()
(through nohz_balancer_kick() via sched_tick()) modify (write)
nohz.nr_cpus (and/or nohz.idle_cpu_mask) and nohz.has_blocked.
My first reaction on reading the whole changelog was: "but .nr_cpus and
.idle_cpus_mask are in the same cacheline?!", which as Ingo pointed out
somewhere down [1] isn't true for CPUMASK_OFFSTACK, so this change
effectively gets rid of the dirtying of one extra cacheline during idle
entry/exit.
[1]: http://lore.kernel.org/r/aS3za7X9BLS5rg65@xxxxxxxxx
I'd suggest adding something like so in this part of the changelog:
"""
Note that nohz.idle_cpus_mask and nohz.nr_cpus reside in the same
cacheline, however under CONFIG_CPUMASK_OFFSTACK the backing storage for
nohz.idle_cpus_mask will be elsewhere. This implies two separate cachelines
being dirtied upon idle entry / exit.
"""
ok. Will do that. Thanks.
Even for CONFIG_CPUMASK_OFFSTACK=n, usual configuration is like 512/1024/
2048 or higher.
For 64 byte cacheline, 1 cacheline can hold 512 CPUs.
So idle_cpus_mask and rest of nohz fields including nr_cpus will be in different
cacheline.
Even for powerpc(128 byte cacheline), where CONFIG_CPUMASK_OFFSTACK=n,
default is NR_CPUS=2048. that means idle_cpus_mask will take 2 cachelines and rest
of nohz fields will be in third cacheline.
So in most of the cases, this implies dirtying one less cacheline.
data points with CONFIG_CPUMASK_OFFSTACK=y/n
[1]: https://lore.kernel.org/all/fdb378e7-7797-4aeb-a79f-12af4cb1b81a@xxxxxxxxxxxxx/