Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection

From: Vincent Guittot

Date: Tue Apr 21 2026 - 08:28:23 EST


On Tue, 21 Apr 2026 at 11:35, Andrea Righi <arighi@xxxxxxxxxx> wrote:
>
> On Tue, Apr 21, 2026 at 11:01:41AM +0200, Andrea Righi wrote:
> > Hi Prateek,
> >
> > On Mon, Apr 20, 2026 at 11:42:23PM +0200, Andrea Righi wrote:
> > ...
> > > > >> I still have one question: Can first SD_ASYM_CPUCAPACITY_FULL be set at
> > > > >> a SD_NUMA?
> > > > >>
> > > > >> We'll need to deal with overlapping domains then but seems like it could
> > > > >> be possible with weird cpusets :-(
> > > > >>
> > > > >> But in that case, do we even want to search CPUs outside the NUMA in
> > > > >> select_idle_capacity()? I don't think anything stops this currently but
> > > > >> I might be wrong.
> > > > >
> > > > > My $0.02 on this.
> > > > >
> > > > > In theory it could happen with unusual topologies or constrained cpusets,
> > > > > although it should be quite rare. That said, select_idle_capacity() already
> > > > > operates on the span of sd_asym_cpucapacity, so if that domain crosses NUMA
> > > > > boundaries, we're already scanning across NUMA today. This patch doesn't
> > > > > fundamentally alter this behavior.
> > > >
> > > > Ack! I was just thinking loud from the topology standpoint since
> > > > sd->shared is not designed to handle the overlapping domains like
> > > > sg->sgc does but we can probably figure some way to make it work.
> > > >
> > > > Using the ring topology example from topology.c:
> > > >
> > > > 0 ----- 1
> > > > | |
> > > > | |
> > > > | |
> > > > 3 ----- 2
> > > >
> > > > Consider NUMA-1 below gets the SD_ASYM_CPUCAPACITY_FULL flag:
> > > >
> > > > NUMA-2 0-3 0-3 0-3 0-3
> > > > groups: {0-1,3},{1-3} {0-2},{0,2-3} {1-3},{0-1,3} {0,2-3},{0-2}
> > > >
> > > > NUMA-1 0-1,3 0-2 1-3 0,2-3
> > > > groups: {0},{1},{3} {0},{1},{2} {1},{2},{3} {0},{2},{3}
> > > >
> > > > NUMA-0 0 1 2 3
> > > >
> > > >
> > > > The "sd->shared" assignments at NUMA-1 will put first, second, and the
> > > > last domain in the same "shared" range by today's logic since the first
> > > > CPU in their span is the same although their spans are slightly
> > > > different.
> > > >
> > > > The third will be standalone since the first CPU of the domain span
> > > > will be different.
> > >
> > > Yeah, makes sense. I'm wondering if we should attach the shared blob to
> > > sd_asym_cpucapacity only when asym is a non-overlapping domain, otherwise
> > > fallback to sd_llc and, in this case, ignore has_idle_cores in
> > > select_idle_capacity(). This might be not the best in terms of efficiency on
> > > those exotic topologies, but it'd eliminate the overlap/aliasing risk, while
> > > still being correct. What do you think?
> >
> > I slightly changed your patch adding this logic on top, I'll send an updated
> > patch series, so it's easier to review/comment.
>
> Actually... while preparing the series I realized that in select_idle_capacity()
> we may end up clearing the has_idle_cores hint even when the failure is due to
> affinity constraints (no fit CPU in the allowed cpumask), not only when no fully
> idle core is found in the system and this can lead to false has_idle_cores
> hints.

How is it different from select_idle_cpu() which does the same afaict ?

>
> At this point I'm wondering if it's better to just ignore the has_idle_cores
> hint completely in the smt+asym-cpu-capacity scenario (which would also simplify
> the exotic topology cases).
>
> I did some quick tests with this on Vera and I'm getting pretty much the same
> performance results. Opinions? Am I missing something?
>
> Thanks,
> -Andrea