Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity

From: Andrea Righi

Date: Wed Apr 01 2026 - 09:47:24 EST

On Wed, Apr 01, 2026 at 02:42:34PM +0200, Andrea Righi wrote:
> On Wed, Apr 01, 2026 at 02:08:27PM +0200, Vincent Guittot wrote:
> > On Wed, 1 Apr 2026 at 13:57, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
> > >
> > > On 31.03.26 11:04, Andrea Righi wrote:
> > > > Hi Dietmar,
> > > >
> > > > On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote:
> > > >> Hi Andrea,
> > > >>
> > > >> On 26.03.26 16:02, Andrea Righi wrote:
> > >
> > > [...]
> > >
> > > >> So does (2) with NO_SIS_UTIL performs worse than (1) with your smt
> > > >> related add-ons in sic()?
> > > >
> > > > Thanks for running these experiments and sharing the data, this is very
> > > > useful!
> > > >
> > > > I did a quick test on Vera using the NVBLAS benchmark, comparing NO
> > > > ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be
> > > > within error range. I'll also run DCPerf MediaWiki with all the different
> > >
> > > I'm not familiar with the NVBLAS benchmark. Does it drive your system
> > > into 'sd->shared->nr_idle_scan = 0' state?
>
> It's something internally unfortunately... it's just running a single
> CPU-intensive task for each SMT core (in practice half of the CPUs tasks).
> I don't think we're hitting sd->shared->nr_idle_scan == 0 in this case.

Just finished running some tests with DCPerf MediaWiki on Vera as well
(sorry, it took a while, I did mutliple runs to rule out potential flukes):

+---------------------------------+--------+--------+--------+--------+
| Configuration | rps | p50 | p95 | p99 |
+---------------------------------+--------+--------+--------+--------+
| NO ASYM + SIS_UTIL | 8113 | 0.067 | 0.184 | 0.225 |
| NO ASYM + NO_SIS_UTIL | 8093 | 0.068 | 0.184 | 0.223 |
| | | | | |
| ASYM + SMT + SIS_UTIL | 8129 | 0.076 | 0.149 | 0.188 |
| ASYM + SMT + NO_SIS_UTIL | 8138 | 0.076 | 0.148 | 0.186 |
| | | | | |
| ASYM + ILB SMT + SIS_UTIL | 8189 | 0.075 | 0.150 | 0.189 |
| ASYM + SMT + ILB SMT + SIS_UTIL | 8185 | 0.076 | 0.151 | 0.190 |
+---------------------------------+--------+--------+--------+--------+

Looking at the data:
- SIS_UTIL doesn't seem relevant in this case (differences are within
error range),
- ASYM_CPU_CAPACITY seems to provide a small throughput gain, but it seems
more beneficial for tail latency reduction,
- the ILB SMT patch seems to slightly improve throughput, but the biggest
benefit is still coming from ASYM_CPU_CAPACITY.

Overall, also in this case it seems beneficial to use ASYM_CPU_CAPACITY
rather than equalizing the capacities.

That said, I'm still not sure why ASYM is helping. The frequency asymmetry
is really small (~2%), so the latency improvements are unlikely to come
from prioritizing the faster cores, as that should mainly affect throughput
rather than tail latency and likely to a smaller extent.

Thanks,
-Andrea