Re: NO_HZ_FULL vs NO_HZ_IDLE: ~300ns cyclictest latency regression from context tracking overhead

From: Valentin Schneider

Date: Fri Mar 27 2026 - 12:15:31 EST


On 26/03/26 18:57, Ionut Nechita wrote:
> Hi,
>
> I'm seeing a consistent ~300ns average latency increase in cyclictest when
> switching from CONFIG_NO_HZ_IDLE=y to CONFIG_NO_HZ_FULL=y on a PREEMPT_RT
> kernel. I'd like to understand whether this is expected behavior or if there's
> room for improvement in the context tracking overhead on x86.
>
> Hardware:
> - Intel Xeon Gold 6338N @ 2.20GHz (Ice Lake SP)
> - 32 cores / 64 threads, single socket
> - L3 cache: 48 MiB
>
> Kernel: 6.12.78-rt (PREEMPT_RT)
>
> Boot parameters (identical for both tests except nohz_full/isolcpus):
> nohz_full=1-16,33-48 isolcpus=nohz,domain,managed_irq,1-16,33-48
> rcu_nocbs=1-31,33-63 kthread_cpus=0,32 irqaffinity=17-31,49-63
> intel_pstate=none nopti nospectre_v2 nospectre_v1 psi=0
>
> Cyclictest command:
> cyclictest --priority 95 --nsecs --histofall 40000 --smi \
> --duration 7200 --affinity 1-15,33-47 --threads 30 --mainaffinity 0
>
> Config difference:
> NO_HZ_IDLE build:
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> # CONFIG_TICK_CPU_ACCOUNTING is not set
>
> NO_HZ_FULL build:
> CONFIG_NO_HZ_FULL=y
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
>
> Results summary (30 threads, 2-hour run, values in nanoseconds):
>
> NO_HZ_IDLE NO_HZ_FULL
> Min (range): 1697 - 1783 2026 - 2182
> Avg (range): 1876 - 2000 2131 - 2333
> Max (range): 2017 - 14491 6832 - 11539
> SMI: 0 0
>
> The average latency shifts from ~1930ns to ~2270ns, a consistent ~300-350ns
> increase across all 30 threads. The minimum latency floor also rises by
> roughly 400ns.
>
> My understanding is that CONFIG_NO_HZ_FULL forces
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y on x86, which adds context tracking overhead
> on every kernel entry/exit. Since cyclictest does a clock_nanosleep() syscall
> every 1ms, this overhead is hit on every cycle.
>
> Questions:
> 1. Is ~300ns of additional overhead from context tracking considered
> acceptable/expected on Ice Lake with PREEMPT_RT?
> 2. Has there been any work on reducing the context tracking cost on x86,
> or making VIRT_CPU_ACCOUNTING optional with NO_HZ_FULL?
> 3. For RT workloads that do periodic syscalls (not pure userspace polling),
> is the recommendation to simply stay on NO_HZ_IDLE?
>

I can't answer for 2., but for 1. & 3.: NO_HZ_FULL is about squeezing as
much as you can out of a CPU running a pure userspace application. Any
kernel entry is an anomaly / interference, IOW "you lose". If you know your
workload is going to periodically enter the kernel, put in on a CPU that
isn't nohz_full (but maybe isolated).

> I can provide full histogram data or run additional tests if helpful.
>
> Thanks,
> Ionut