Re: [RFC PATCH 0/7] Optimization to reduce the cost of newidle balance

From: Peter Zijlstra
Date: Wed Jul 17 2024 - 08:18:12 EST


On Thu, Jul 27, 2023 at 10:33:58PM +0800, Chen Yu wrote:
> Hi,
>
> This is the second version of the newidle balance optimization[1].
> It aims to reduce the cost of newidle balance which is found to
> occupy noticeable CPU cycles on some high-core count systems.
>
> For example, when running sqlite on Intel Sapphire Rapids, which has
> 2 x 56C/112T = 224 CPUs:
>
> 6.69% 0.09% sqlite3 [kernel.kallsyms] [k] newidle_balance
> 5.39% 4.71% sqlite3 [kernel.kallsyms] [k] update_sd_lb_stats
>
> To mitigate this cost, the optimization is inspired by the question
> raised by Tim:
> Do we always have to find the busiest group and pull from it? Would
> a relatively busy group be enough?

So doesn't this basically boil down to recognising that new-idle might
not be the same as regular load-balancing -- we need any task, fast,
rather than we need to make equal load.

David's shared runqueue patches did the same, they re-imagined this very
path.

Now, David's thing went side-ways because of some regression that wasn't
further investigated.

But it occurs to me this might be the same thing that Prateek chased
down here:

https://lkml.kernel.org/r/20240710090210.41856-1-kprateek.nayak@xxxxxxx

Hmm ?

Supposing that is indeed the case, I think it makes more sense to
proceed with that approach. That is, completely redo the sub-numa new
idle balance.