Re: [RFC PATCH] sched/fair: scale wake_wide() threshold by SMT width

From: Dietmar Eggemann

Date: Wed Apr 22 2026 - 09:34:57 EST

On 16.04.26 09:41, Zhang Qiao wrote:
> Hi Shrikanth,
>
> 在 2026/4/8 1:58, Shrikanth Hegde 写道:
>> Hi.
>>
>> On 4/7/26 12:09 PM, Zhang Qiao wrote:
>>> wake_wide() uses sd_llc_size as the spreading threshold to detect wide
>>> waker/wakee relationships and to disable wake_affine() for those cases.
>>>
>>> On SMT systems, sd_llc_size counts logical CPUs rather than physical
>>> cores. This inflates the wake_wide() threshold, allowing wake_affine()
>>> to pack more tasks into one LLC domain than the actual compute capacity
>>> of its physical cores can sustain. The resulting SMT interference may
>>> cost more than the cache-locality benefit wake_affine() intends to gain.
>>>
>>
>> Isn't load balance to move it out? What does the workload do?
>
> The workload is a producer-consumer model: one producer wakes up ~50
> different consumers, with roughly 10+ consumers running concurrently.
> The total number of tasks is well below the CPU count.

But higher than your MC core count I believe? Otherwise you wouldn't
care. I assume you have MC CPU count of 12-24. Do you have more than 2
different MCs.

> In this scenario, load balancing is largely ineffective. Each consumer
> spends most of its time sleeping, gets woken by the producer, runs
> briefly to process the message, then goes back to sleep. There is
> almost no window where a consumer sits on a CPU runqueue in the runnable
> state waiting to be pulled. Since load balancing can only migrate
> runnable tasks, it simply has no target to act on here.

OK, but SD_BALANCE_WAKE is not set by default, nobody would experience a
difference in behaviour on an SMT machine in terms of waking tasks wide,
i.e. going through the slow path. Like I tried to explain in the
adjacent thread, your wakees would only end up in the slow path in case
your sched domains would have SD_BALANCE_WAKE set.

Or do you just want to force wakeups which have wake_wide(p) return 1
always into the fast path with 'new_cpu == prev_cpu'? But this wouldn't
be wake wide?

[...]