Re: [RFC PATCH 18/19] sched/fair: Optimize global "nohz.nr_cpus" tracking
From: Shrikanth Hegde
Date: Wed Sep 24 2025 - 16:03:24 EST
On 9/4/25 9:45 AM, K Prateek Nayak wrote:
Optimize "nohz.nr_cpus" by tracking number of "sd_nohz->shared" with
non-zero "nr_idle_cpus" count via "nohz.nr_doms" and only updating at
the boundary of "sd_nohz->shared->nr_idle_cpus" going from 0 -> 1 and
back from 1 -> 0.
This also introduces a chance of double accounting when a nohz idle
entry or the tick races with hotplug or cpuset as described in
__nohz_exit_idle_tracking().
__nohz_exit_idle_tracking() called when the sched_domain_shared nodes
tracking idle CPUs are freed is used to correct any potential double
accounting which can unnecessarily trigger nohz idle balances even when
all the CPUs have tick enabled.
Is it possible to get rid of this nr_cpus or nr_doms altogether?
The reason being, with current code, one updates the nohz.idle_cpus_mask and
then inc/dec nr_cpus.
The only use it to decide to do periodic idle balancing or not.
If instead, could use cpumask_empty(nohz.idle_cpus_mask) check no?
It may not be every tick accurate, but that may be ok.
I haven't gone through your series in detail yet, but similar thing is doable,
check if the list is empty or not.