Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity

From: Dietmar Eggemann

Date: Fri Apr 03 2026 - 07:48:03 EST

On 01.04.26 15:12, Andrea Righi wrote:
> On Wed, Apr 01, 2026 at 02:42:34PM +0200, Andrea Righi wrote:
>> On Wed, Apr 01, 2026 at 02:08:27PM +0200, Vincent Guittot wrote:
>>> On Wed, 1 Apr 2026 at 13:57, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>>>
>>>> On 31.03.26 11:04, Andrea Righi wrote:
>>>>> Hi Dietmar,
>>>>>
>>>>> On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote:
>>>>>> Hi Andrea,
>>>>>>
>>>>>> On 26.03.26 16:02, Andrea Righi wrote:

[...]

> Just finished running some tests with DCPerf MediaWiki on Vera as well
> (sorry, it took a while, I did mutliple runs to rule out potential flukes):
>
> +---------------------------------+--------+--------+--------+--------+
> | Configuration | rps | p50 | p95 | p99 |

Just to make sure: rps -> "Wrk RPS" and pXX -> "Nginx PXX time" in
run_details.json ?

> +---------------------------------+--------+--------+--------+--------+
> | NO ASYM + SIS_UTIL | 8113 | 0.067 | 0.184 | 0.225 |
> | NO ASYM + NO_SIS_UTIL | 8093 | 0.068 | 0.184 | 0.223 |

Thanks for the test results! Ok, so SIS_UTIL doesn't seem to play a role
here. This workload should have #runnable tasks > #CPUs.

Still trying to grasp why 'sic() + smt' is better than 'sis() + smt' for
NVBLAS?

There is a subtle differences in the start cpu for iterating:

sis(): for_each_cpu_wrap(cpu, cpus, target + 1)
^^^
sic(): for_each_cpu_wrap(cpu, cpus, target)

Not sure if this makes all the difference?

> | | | | | |
> | ASYM + SMT + SIS_UTIL | 8129 | 0.076 | 0.149 | 0.188 |
> | ASYM + SMT + NO_SIS_UTIL | 8138 | 0.076 | 0.148 | 0.186 |

This should be the same, right? SIS_UTIL is only for sis() so when using
sic() this shouldn't differ. Or did you code SIS_UTIL into sic()?

> | | | | | |
> | ASYM + ILB SMT + SIS_UTIL | 8189 | 0.075 | 0.150 | 0.189 |
> | ASYM + SMT + ILB SMT + SIS_UTIL | 8185 | 0.076 | 0.151 | 0.190 |
> +---------------------------------+--------+--------+--------+--------+

So with '#tasks > #CPUs' smt doesn't make a difference.

> Looking at the data:
> - SIS_UTIL doesn't seem relevant in this case (differences are within
> error range),
> - ASYM_CPU_CAPACITY seems to provide a small throughput gain, but it seems
> more beneficial for tail latency reduction,
> - the ILB SMT patch seems to slightly improve throughput, but the biggest
> benefit is still coming from ASYM_CPU_CAPACITY.

> Overall, also in this case it seems beneficial to use ASYM_CPU_CAPACITY
> rather than equalizing the capacities.
>
> That said, I'm still not sure why ASYM is helping. The frequency asymmetry

OK, I still would be more comfortable with this when I would now why
this is :-)

> is really small (~2%), so the latency improvements are unlikely to come
> from prioritizing the faster cores, as that should mainly affect throughput
> rather than tail latency and likely to a smaller extent.

[...]