Re: [PATCH v7] sched/clock: Avoid false sharing for sched_clock_irqtime

From: Guo, Wangyang

Date: Tue Jan 27 2026 - 21:19:36 EST

On 1/27/2026 7:04 PM, Shrikanth Hegde wrote:

Hi Wangyang, Prateek.

On 1/27/26 12:55 PM, Wangyang Guo wrote:

Read-mostly sched_clock_irqtime may share the same cacheline with
frequently updated nohz struct. Make it as static_key to avoid
false sharing issue.

The only user of disable_sched_clock_irqtime()
is tsc_.*mark_unstable() which may be invoked under atomic context
and require a workqueue to disable static_key. But both of them
calls clear_sched_clock_stable() just before doing
disable_sched_clock_irqtime(). We can reuse
"sched_clock_work" to also disable sched_clock_irqtime().

One additional case need to handle is if the tsc is marked unstable
before late_initcall() phase, sched_clock_work will not be invoked
and sched_clock_irqtime will stay enabled although clock is unstable:
   tsc_init()
     enable_sched_clock_irqtime() # irqtime accounting is enabled here
     ...
     if (unsynchronized_tsc()) # true
       mark_tsc_unstable()
         clear_sched_clock_stable()
           __sched_clock_stable_early = 0;
           ...
           if (static_key_count(&sched_clock_running.key) == 2)
             # Only happens at sched_clock_init_late()
             __clear_sched_clock_stable(); # Never executed
   ...

   # late_initcall() phase
   sched_clock_init_late()
     if (__sched_clock_stable_early) # Already false
       __set_sched_clock_stable(); # sched_clock is never marked stable
   # TSC unstable, but sched_clock_work won't run to disable irqtime

So we need to disable_sched_clock_irqtime() in sched_clock_init_late()
if clock is unstable.

Do you this as a valid case? have you tested with CONFIG_PARAVIRT?

Lets say you have a non native sched clock such as kvm_sched_clock_read.
tsc_init -> sets enable_sched_clock_irqtime()
         ->mark_tsc_unstable -> if using_native_sched_clock -> clear_sched_clock_stable

In this case, since clear_sched_clock_stable won't be called you may not disable the
sched clock irqtime since __sched_clock_stable_early is reset only in clear_sched_clock_stable

For hypervisor, I see this path may call clear_sched_clock_stable when clock is unstable at init:

kvm_init_platform() ->
kvmclock_init() -> kvm_sched_clock_init(stable):
if (!stable) clear_sched_clock_stable()
paravirt_set_sched_clock(kvm_sched_clock_read)

Bigger concern(maybe) is clock source marked as stable still, though called mark_tsc_unstable in
non native sched clock?

Disclaimer: (just curious, seeing this x86 code for first time, so may not know all paths)

Yes, when clock mark unstable through tsc_.*mark_unstable() with non-native_sched_clock, clear_sched_clock_stable won't be called, thus sched_clock_irqtime still keep enabled.

Maybe the dedicated workqueue for sched_clock_irqtime is still needed considering this case.