Re: [PATCH RESEND 1/2] tick/nohz: Fix wrong NOHZ idle CPU state

From: Shubhang Kaushik

Date: Wed Feb 11 2026 - 18:20:07 EST


Hi Frederic,
On Thu, 4 Sep 2025, Frederic Weisbecker wrote:

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c527b421c865..b900a120ab54 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1229,8 +1229,9 @@ void tick_nohz_idle_stop_tick(void)
ts->idle_sleeps++;
ts->idle_expires = expires;

- if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
- ts->idle_jiffies = ts->last_jiffies;
+ if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+ if (!was_stopped)
+ ts->idle_jiffies = ts->last_jiffies;
nohz_balance_enter_idle(cpu);

The current state is indeed broken and some people have already tried to fix it.
The thing is nohz_full don't want dynamic isolation because it is deemed to run a
single task. Therefore those tasks must be placed manually in order not to break
isolation guarantees by accident.

In fact nohz_full doesn't make much sense without isolcpus (or isolated cpuset
v2 partitions) and I even intend to make nohz_full depend on domain isolation
in the long term.

Thanks.

Following up on the isolation concerns raised previously, Iʼve posted an updated patch [1] that provides a clearer justification and performance data from Ampere Altra.

The core issue identified is that on high core count systems, nohz_full CPUs often become stranded idle because they are missing from nohz.idle_cpus_mask. While I understand the intent for manual isolation, our testing shows that this current behavior leads to significant under utilization.

- LLM Workloads: ~14% throughput improvement in llama-batched-bench.
- Scheduler Jitter: ~26% improvement in hackbench multi-process tests.

The patch decouples the tick-stop accounting (which should only happen once) from the balancer registration. Because nohz_balance_enter_idle() is idempotent, an idle CPU would be visible to the balancer without breaking the isolation of other cores that are actually running tasks.

If a CPU has entered do_idle(), it is no longer running an isolated workload. It is invisible to the balancer at that point leading to performance regression rather than an isolation guarantee.

You can find the updated patch and full performance breakdown here:
[1] https://lkml.org/lkml/2026/2/3/2119

Regards,
Shubhang Kaushik

}
} else {
--
2.34.1


--
Frederic Weisbecker
SUSE Labs