Re: [PATCH v3 3/3] sched/fair: Remove nohz.nr_cpus and use weight of cpumask instead
From: Valentin Schneider
Date: Fri Jan 09 2026 - 09:46:33 EST
On 07/01/26 12:21, Shrikanth Hegde wrote:
> nohz.nr_cpus was observed as contended cacheline when running
> enterprise workload on large systems.
>
> Fundamental scalability challenge with nohz.idle_cpus_mask
> and nohz.nr_cpus is the following:
>
> (1) nohz_balancer_kick() observes (reads) nohz.nr_cpus
> (or nohz.idle_cpu_mask) and nohz.has_blocked to see whether there's
> any nohz balancing work to do, in every scheduler tick.
>
> (2) nohz_balance_enter_idle() and nohz_balance_exit_idle()
> (through nohz_balancer_kick() via sched_tick()) modify (write)
> nohz.nr_cpus (and/or nohz.idle_cpu_mask) and nohz.has_blocked.
>
My first reaction on reading the whole changelog was: "but .nr_cpus and
.idle_cpus_mask are in the same cacheline?!", which as Ingo pointed out
somewhere down [1] isn't true for CPUMASK_OFFSTACK, so this change
effectively gets rid of the dirtying of one extra cacheline during idle
entry/exit.
[1]: http://lore.kernel.org/r/aS3za7X9BLS5rg65@xxxxxxxxx
I'd suggest adding something like so in this part of the changelog:
"""
Note that nohz.idle_cpus_mask and nohz.nr_cpus reside in the same
cacheline, however under CONFIG_CPUMASK_OFFSTACK the backing storage for
nohz.idle_cpus_mask will be elsewhere. This implies two separate cachelines
being dirtied upon idle entry / exit.
"""