Re: [RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state

From: Frederic Weisbecker

Date: Wed Mar 11 2026 - 07:11:16 EST


Le Fri, Feb 13, 2026 at 10:15:15AM -0800, Christoph Lameter (Ampere) a écrit :
> On Fri, 13 Feb 2026, Frederic Weisbecker wrote:
>
> > Then there seem to be something else going on that we don't fully understand
> > because isolated CPUs run 1 pinned task per CPU and the only housekeeping CPU
> > is CPU 0. So there is nothing to balance here.
> >
> > Perhaps some CPUs spend too much time scanning through all isolated CPUs to
> > see if there is balancing to do. I don't know, this needs further investigation.
> > But if the nohz_full CPUs are correctly domain isolated as they should
> > (through isolcpus=domain or cpuset isolated partitions), they should be
> > invisible to ilb anyway.
>
>
> "balancing" would mean moving tasks from busy cpus (that are not in
> NOHZ_FULL state) to idle cpus that can then be in NOHZ_FULL state.
>
> If the move to from a busy cpu to an idle cpu succeeds then both cpus may
> only run one process and be able to enter NOHZ_FULL.
>
> This is f.e. the caser with threadpools used by certain AI apps. Before
> the app starts numactl is used to setup a group of cpus that the app can use.
>
> One may optimize and allow NOHZ_FULL for these cpus.
>
> The app will then create a number of threads during its startup phase.
> These should be all placed on idle cpus in the allowed cpu range.
>
> If this is configured the right way then each thread is on a different cpu
> and there is one thread per cpu so that we can use NOHZ_FULL.
>
> This is sometimes broken because not all idle cpus are used. Instead some
> cpus get two threads and other cpus stay idle. That is why idle load
> balancing is needed.

Which means you guys eventually rely on load balancing...
So I can only repeat what I said there:

https://lore.kernel.org/lkml/aY3k1_JJjPFUhPd4@localhost.localdomain/

> There is no cpu isolation/cgroups or other black magic involved here.

Too bad, static task placement would fix your issue and domain isolation
would improve your workload.

Thanks.

--
Frederic Weisbecker
SUSE Labs