Re: [RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state

From: Frederic Weisbecker

Date: Fri Feb 13 2026 - 07:56:16 EST


Le Thu, Feb 12, 2026 at 11:36:06AM -0800, Shubhang Kaushik a écrit :
> Hi Frederic,
>
> On Thu, 12 Feb 2026, Frederic Weisbecker wrote:
>
> > >
> > > Tested on Ampere Altra on 6.19.0-rc8 with CONFIG_NO_HZ_FULL enabled:
> > > - This change improves load distribution by ensuring that tickless idle
> > > CPUs are visible to NOHZ idle load balancing. In llama-batched-bench,
> > > throughput improves by up to ~14% across multiple thread counts.
> > > - Hackbench single-process results improve by 5% and multi-process
> > > results improve by up to ~26%, consistent with reduced scheduler
> > > jitter and earlier utilization of fully idle cores.
> > > No regressions observed.
> >
> > Because you rely on dynamic placement of isolated tasks throughout isolated
> > CPUs by the scheduler.
> >
> > But nohz_full is designed for running only one task per isolated CPU without
> > any disturbance. And migration is a significant disturbance. This is why
> > nohz_full tries not to be too smart and assumes that task placement is entirely
> > within the hands of the user.
> >
> > So I have to ask, what prevents you from using static task placement in your
> > workload?
>
> Actually, the llama-batched-bench results I shared already included static
> affinity testing via numactl -C.
>
> Even with static placement, we observe this ~14% throughput improvement.
> This suggests that the issue isn't about the scheduler trying to be smart
> with task migration, but rather about the side effects of an idle CPU being
> absent from nohz.idle_cpus_mask.
>
> When nohz_full CPUs enter idle but aren't correctly accounted for in the
> idle mask, it appears to cause unnecessary overhead or interference in the
> NOHZ load balancing logic for the CPUs that are still running tasks. By
> ensuring the idle state is correctly tracked, we're not encouraging
> migration, but rather ensuring the scheduler's global state accurately
> reflects reality.

Then there seem to be something else going on that we don't fully understand
because isolated CPUs run 1 pinned task per CPU and the only housekeeping CPU
is CPU 0. So there is nothing to balance here.

Perhaps some CPUs spend too much time scanning through all isolated CPUs to
see if there is balancing to do. I don't know, this needs further investigation.
But if the nohz_full CPUs are correctly domain isolated as they should
(through isolcpus=domain or cpuset isolated partitions), they should be
invisible to ilb anyway.

Thanks.

--
Frederic Weisbecker
SUSE Labs