Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity

From: Dietmar Eggemann

Date: Fri Apr 03 2026 - 07:47:28 EST


On 01.04.26 14:42, Andrea Righi wrote:
> On Wed, Apr 01, 2026 at 02:08:27PM +0200, Vincent Guittot wrote:
>> On Wed, 1 Apr 2026 at 13:57, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>>
>>> On 31.03.26 11:04, Andrea Righi wrote:
>>>> Hi Dietmar,
>>>>
>>>> On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote:
>>>>> Hi Andrea,
>>>>>
>>>>> On 26.03.26 16:02, Andrea Righi wrote:

[...]

>>>> I did a quick test on Vera using the NVBLAS benchmark, comparing NO
>>>> ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be
>>>> within error range. I'll also run DCPerf MediaWiki with all the different

Ah, but this benchmark with '#tasks == #cores' is tailored for this
prefer_core thing. And SIS_UTIL shouldn't close the idle CPU search.

>>> I'm not familiar with the NVBLAS benchmark. Does it drive your system
>>> into 'sd->shared->nr_idle_scan = 0' state?
>
> It's something internally unfortunately... it's just running a single
> CPU-intensive task for each SMT core (in practice half of the CPUs tasks).
> I don't think we're hitting sd->shared->nr_idle_scan == 0 in this case.

OK.

>>> We just have to understand where this benefit of using sic() instead of
>>> sis() is coming from. I'm doubtful that this is the best_cpu thing after
>>> if (!choose_idle_cpu(cpu, p)) in sic()'s for_each_cpu_wrap(cpu, cpus,
>>> target) loop given that the CPU capacity diffs are so small.
>>>
>>>> configurations to see if I get similar results.
>>>>
>>>> More in general, I agree that for small capacity differences (e.g., within
>>>> ~5%) the benefits of using ASYM_CPUCAPACITY is questionable. And I'm also
>>>> fine to go back to the idea of grouping together CPUS within the 5%
>>>> capacity window, if we think it's a safer approach (results in your case
>>>> are quite evident, and BTW, that means we also shouldn't have
>>>> ASYM_CPU_CAPACITY on Grace, so in theory the 5% threshold should also
>>>> improve performance on Grace, that doesn't have SMT).
>>>
>>> There shouldn't be so many machines with these binning-introduced small
>>> CPU capacity diffs out there? In fact, I only know about your Grace
>>> (!smt) and Vera (smt) machines.
>>
>> In any case it's always better to add the support than enabling asym_packing

Yeah, the question for me is more between existing 'sis() + smt' or this
new 'sic() + smt' with those minor CPU capacity differences.

[...]

>>> IMHO, in case we would know where this improvement is coming from using
>>> sic() instead of default sis() (which already as smt support) then
>>> maybe, it's a lot of extra code at the end ... And mobile big.LITTLE
>>> (with larger CPU capacity diffs) doesn't have smt.
>>
>> The last proposal based on prateek proposal in sic() doesn't seems that large
>
> Exactly, I was referring just to that patch, which would solve the big part
> of the performance issue. We can ignore the ILB part for now.

OK, I see. It's in your v2 you sent out earlier today so I will comment
there.