Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry

From: Andrea Righi

Date: Thu Mar 26 2026 - 05:26:55 EST


On Thu, Mar 26, 2026 at 09:20:45AM +0100, Vincent Guittot wrote:
> On Thu, 26 Mar 2026 at 09:12, Andrea Righi <arighi@xxxxxxxxxx> wrote:
> >
> > Hi Christian,
> >
> > On Wed, Mar 25, 2026 at 06:13:11PM +0000, Christian Loehle wrote:
> > ...
> > > RFT:
> > > Andrea, please give this a try. This should perform better in particular
> > > for single-threaded workloads and workloads that do not utilize all
> > > cores (all the time anyway).
> > > Capacity-aware scheduling wakeup works very different to the SMP path
> > > used now, some workloads will benefit, some regress, it would be nice
> > > to get some test results for these.
> > > We already discussed DCPerf MediaWiki seems to benefit from
> > > capacity-aware scheduling wakeup behavior, but others (most?) should
> > > benefit from this series.
> > >
> > > I don't know if we can also be clever about ordering amongst SMT siblings.
> > > That would be dependent on the uarch and I don't have a platform to
> > > experiment with this though, so consider this series orthogonal to the
> > > idle-core SMT considerations.
> > > On platforms with SMT though asympacking makes a lot more sense than
> > > capacity-aware scheduling, because arguing about capacity without
> > > considering utilization of the sibling(s) (and the resulting potential
> > > 'stolen' capacity we perceive) isn't theoretically sound.
> >
> > I did some early testing with this patch set. On Vera I'm getting much
> > better performance that SD_ASYM_CPUCAPACITY of course (~1.5x avg speedup),
> > mostly because we avoid using both SMT siblings. It's still not the same
> > improvement that I get equalizing the capacity using the 5% threshold
> > (~1.8x speedup).
>
> IIRC the tests that you shared in your patch, you get an additonal
> improvement when adding some SMT awarness to SD_ASYM_CPUCAPACITY
> compared to equalizing the capacity

Yes, adding SMT awareness to SD_ASYM_CPUCAPACITY is still the apparoach
that gives me the best performance so far on Vera (~1.9x avg speedup),
among all those that I've tested.

I'll post the updated patch set that I'm using, so we can also elaborate
more on that approach as well.

Thanks,
-Andrea