Re: [PATCH] sched/fair: Avoid false sharing in nohz struct

From: Shrikanth Hegde

Date: Tue Dec 23 2025 - 02:28:09 EST




On 12/22/25 7:51 AM, Guo, Wangyang wrote:
On 12/21/2025 9:05 PM, Shrikanth Hegde wrote:
Hi Wangyang,

On 12/11/25 11:26 AM, Wangyang Guo wrote:
There are two potential false sharing issue in nohz struct:
1. idle_cpus_mask is a read-mostly field, but share the same cacheline
    with frequently updated nr_cpus.

Updates to idle_cpus_mask is not same cacheline. it is updated alongside nr_cpus.

with CPUMASK_OFFSTACK=y, idle_cpus_mask is a pointer to the actual mask.
Updates to it happen in another cacheline.

with CPUMASK_OFFSTACK=n, idle_cpus_mask is on the stack and its length
depends on NR_CPUS. typical value being 512/2048/8192 it can span a few
cachelines. So updates to it likely in different cacheline compared to nr_cpus.

see  https://lore.kernel.org/all/aS6bK4ad-wO2fsoo@xxxxxxxxx/

This patch is mainly target for idle_cpus_mask as a pointer, which is default for many distro OS.


Not all archs.


Likely in your case, nr_cpus updates are the costly ones.
Try below and see if it helps to fix your issue too.
https://lore.kernel.org/all/20251201183146.74443-1-sshegde@xxxxxxxxxxxxx/
I Should send out new version soon.

2. Data followed by nohz still share the same cacheline and has
    potential false sharing issue.


How does your patch handle this?
I don't see any other struct apart from nohz being changed.

The data follow by nohz is implicit and determined by compiler.
For example, this is the layout from /proc/kallsyms in my machine:
ffffffff88600d40 b nohz
ffffffff88600d68 B arch_needs_tick_broadcast
ffffffff88600d6c b __key.264
ffffffff88600d6c b __key.265
ffffffff88600d70 b dl_generation
ffffffff88600d78 b sched_clock_irqtime

What we can do is placing read-mostly `idle_cpus_mask` pointer in a new cacheline, so data followed by nohz would not be affected by nr_cpus.


That's a concern. If it is compiler dependent, then sometime it helps, sometime it wont.

It should done other way around rather than changing the nohz.
If there is structure which has a lot of read/updates, it should go into its
own cacheline rather.

i.e in your case sched_clock_irqtime should go into its own cacheline.

---
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 4f97896887ec..29f9438f9f03 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -25,7 +25,7 @@
*/
DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
-int sched_clock_irqtime;
+int sched_clock_irqtime __cacheline_aligned;
void enable_sched_clock_irqtime(void)
{