Re: [PATCH] sched/fair: Avoid false sharing in nohz struct
From: Guo, Wangyang
Date: Sun Dec 21 2025 - 21:21:49 EST
On 12/21/2025 9:05 PM, Shrikanth Hegde wrote:
Hi Wangyang,This patch is mainly target for idle_cpus_mask as a pointer, which is default for many distro OS.
On 12/11/25 11:26 AM, Wangyang Guo wrote:
There are two potential false sharing issue in nohz struct:
1. idle_cpus_mask is a read-mostly field, but share the same cacheline
with frequently updated nr_cpus.
Updates to idle_cpus_mask is not same cacheline. it is updated alongside nr_cpus.
with CPUMASK_OFFSTACK=y, idle_cpus_mask is a pointer to the actual mask.
Updates to it happen in another cacheline.
with CPUMASK_OFFSTACK=n, idle_cpus_mask is on the stack and its length
depends on NR_CPUS. typical value being 512/2048/8192 it can span a few
cachelines. So updates to it likely in different cacheline compared to nr_cpus.
see https://lore.kernel.org/all/aS6bK4ad-wO2fsoo@xxxxxxxxx/
Likely in your case, nr_cpus updates are the costly ones.
Try below and see if it helps to fix your issue too.
https://lore.kernel.org/all/20251201183146.74443-1-sshegde@xxxxxxxxxxxxx/
I Should send out new version soon.
2. Data followed by nohz still share the same cacheline and has
potential false sharing issue.
How does your patch handle this?
I don't see any other struct apart from nohz being changed.
The data follow by nohz is implicit and determined by compiler.
For example, this is the layout from /proc/kallsyms in my machine:
ffffffff88600d40 b nohz
ffffffff88600d68 B arch_needs_tick_broadcast
ffffffff88600d6c b __key.264
ffffffff88600d6c b __key.265
ffffffff88600d70 b dl_generation
ffffffff88600d78 b sched_clock_irqtime
What we can do is placing read-mostly `idle_cpus_mask` pointer in a new cacheline, so data followed by nohz would not be affected by nr_cpus.
This patch tries to resolve the above two problems by isolating the
frequently updated fields in a single cacheline.
Reported-by: Benjamin Lei <benjamin.lei@xxxxxxxxx>
Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Reviewed-by: Tianyou Li <tianyou.li@xxxxxxxxx>
Signed-off-by: Wangyang Guo <wangyang.guo@xxxxxxxxx>
---
kernel/sched/fair.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5b752324270b..bcc2766b7986 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7193,13 +7193,14 @@ static DEFINE_PER_CPU(cpumask_var_t, should_we_balance_tmpmask);
#ifdef CONFIG_NO_HZ_COMMON
static struct {
- cpumask_var_t idle_cpus_mask;
- atomic_t nr_cpus;
+ /* Isolate frequently updated fields in a cacheline to avoid false sharing issue. */
+ atomic_t nr_cpus ____cacheline_aligned;
int has_blocked; /* Idle CPUS has blocked load */
int needs_update; /* Newly idle CPUs need their next_balance collated */
unsigned long next_balance; /* in jiffy units */
unsigned long next_blocked; /* Next update of blocked load in jiffies */
-} nohz ____cacheline_aligned;
+ cpumask_var_t idle_cpus_mask ____cacheline_aligned;
+} nohz;
This can cause a lot of space wastage.
for exp: powerpc has 128 byte cacheline.
nohz is global, only one exists. The size inflating is minimal, less than 1 cacheline.
BR
Wangyang