Re: [RFC PATCH v5 0/9] steal tasks to improve CPU utilization
From: chenjinghuang
Date: Mon Apr 20 2026 - 00:02:23 EST
On 3/28/2026 10:48 AM, chenjinghuang wrote:
> On 3/20/2026 4:53 PM, Peter Zijlstra wrote:
>> On Fri, Mar 20, 2026 at 05:59:11AM +0000, Chen Jinghuang wrote:
>>> When a CPU has no more CFS tasks to run, and idle_balance() fails to
>>> find a task, then attempt to steal a task from an overloaded CPU in the
>>> same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently
>>> identify candidates. To minimize search time, steal the first migratable
>>> task that is found when the bitmap is traversed. For fairness, search
>>> for migratable tasks on an overloaded CPU in order of next to run.
>>
>> This makes no sense. It is the task of newidle to get more work -- if
>> that is failing for you, then we should fix that, not build a second way
>> to get tasks.
>>
>>
>>
> Hi Peter,
>
> That is a very valid point. However, profiling data indicates that
> newidle_balance() still incurs excessive scanning overhead. Sharing the
> expensive tick balance path is becoming an issue on high-core-count
> systems.
>
> As highlighted in recent ILB threads, profiling sqlite on a 224-CPU
> Sapphire Rapids shows:
>
> https://lore.kernel.org/all/cover.1690273854.git.yu.c.chen@xxxxxxxxx/
>
> 6.69% 0.09% sqlite3 [kernel.kallsyms] [k] newidle_balance
> 5.39% 4.71% sqlite3 [kernel.kallsyms] [k] update_sd_lb_stats
>
> Walking the sd hierarchy in update_sd_lb_stats alone consumes over 5% of
> CPU cycles. If we spend all that time scanning just to pull a tiny,
> short-lived task, the search overhead dwarfs the actual runtime. It's a net
> loss.
>
> To mitigate this cost, I'd like to check if you are open to these two
> directions:
>
> 1.Goal: As Tim questioned in the thread above:
>
> "Do we always have to find the busiest group and pull from it? Would a
> relatively busy group be enough?"
>
> Instead of chasing absolute fairness, shouldn't newidle_balance()
> prioritize fast task acquisition? A task-stealing mechanism can be
> effective in this regard.
>
> 2.Refactoring: Instead of a standalone fix, we could track LLC overload in
> sched_domain_shared and add a lightweight fast path atop newidle_balance(),
> gated by rq->avg_idle.
>
> Do you think refactoring along these lines to integrate into the existing
> framework makes sense?
>
> Thanks, Chen Jinghuang
>
Hi,
Gentle ping on this RFC.
I'd appreciate any feedback whenever you get time, or let me know if I
should resend/rework anything.
Regards,
Chen Jinghuang