Re: [PATCH] sched/nohz: Fix NOHZ imbalance by adding options for ILB CPU

From: Adam Li
Date: Thu Aug 21 2025 - 07:23:53 EST


On 8/20/2025 7:46 PM, Valentin Schneider wrote:
>
> I'd say resend the whole series with the right folks cc'd.
>
OK. I resent the patch series.
Please refer to: https://lore.kernel.org/all/20250821042707.62993-1-adamli@xxxxxxxxxxxxxxxxxxxxxx/

>> 'nohz_full' option is supposed to benefit performance by reducing kernel
>> noise I think. Could you please give more detail on
>> 'NOHZ_FULL context switch overhead'?
>>
>
> The doc briefly touches on that:
>
> https://docs.kernel.org/timers/no_hz.html#omit-scheduling-clock-ticks-for-cpus-with-only-one-runnable-task
>
> The longer story is have a look at kernel/context_tracking.c; every
> transition into and out of the kernel to and from user or idle requires
> additional atomic operations and synchronization.
>
> It would be worth for you to quantify how much these processes
> sleep/context switch, it could be that keep the tick enabled incurs a lower
> throughput penalty than the NO_HZ_FULL overheads.
>

Thanks for the information.

>>> As for the actual balancing, yeah if you have idle NOHZ_FULL CPUs they
>>> won't do the periodic balance; the residual 1Hz remote tick doesn't do that
>>> either. But they should still do the newidle balance to pull work before
>>> going tickless idle, and wakeup balance should help as well, albeit that
>>> also depends on your topology.
>>>
>>
>> I think the newidle balance and wakeup balance do not help in this case
>> because the workload has few sleep and wakeup.
>>
>
> Right. So other than the NO_HZ_FULL vs NO_HZ_IDLE considerations above, you
> could manually affine the threads of the workload. Depending on how much
> control you have over how many threads it spawn, you could either pin on
> thread per CPU, or just spawn the workload into a cpuset covering the
> NO_HZ_FULL CPUs.
>

Yes, binding the threads to CPU can work around the performance
issue caused by load imbalance. Should we document that 'nohz_full' may cause
the scheduler load balancing not working well and CPU affinity is preferred?

> Having the scheduler do the balancing is bit of a precarious
> situation. Your single housekeeping CPU is pretty much going to be always
> running things, does it make sense to have it run the NOHZ idle balance
> when there are available idle NOHZ_FULL CPUs? And in the same sense, does
> it make sense to disturb an idle NOHZ_FULL CPU to get it to spread load on
> other NOHZ_FULL CPUs? Admins that manually affine their threads will
> probably say no.
>

I think when the NOHZ_FULL CPU is added to nohz.idle_cpus_mask and
its tick is stopped, the CPU is 'very' idle. We can safely assign some work to it.

> 9b019acb72e4 ("sched/nohz: Run NOHZ idle load balancer on HK_FLAG_MISC CPUs")
> also mentions SMT being an issue.
>

>From the commit message of 9b019acb72e4:
"The problem was observed with increased jitter on an application
running on CPU0, caused by NOHZ idle load balancing being run on
CPU1 (an SMT sibling)."

Can we say if *no* SMT, it is safe to run NOHZ idle load balancing
on CPU in nohz.idle_cpus_mask? My patch checks '!sched_smt_active()' when
searching from nohz.idle_cpus_mask.

Thanks,
-adam