Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity

From: Andrea Righi

Date: Tue Apr 07 2026 - 15:16:20 EST

Hi Dietmar,

On Tue, Apr 07, 2026 at 01:50:51PM +0200, Dietmar Eggemann wrote:
> On 03.04.26 22:44, Andrea Righi wrote:
> > On Fri, Apr 03, 2026 at 04:46:03PM +0200, Andrea Righi wrote:
> >> On Fri, Apr 03, 2026 at 01:47:17PM +0200, Dietmar Eggemann wrote:
> > ...
> >>>> Looking at the data:
> >>>> - SIS_UTIL doesn't seem relevant in this case (differences are within
> >>>> error range),
> >>>> - ASYM_CPU_CAPACITY seems to provide a small throughput gain, but it seems
> >>>> more beneficial for tail latency reduction,
> >>>> - the ILB SMT patch seems to slightly improve throughput, but the biggest
> >>>> benefit is still coming from ASYM_CPU_CAPACITY.
> >>>
> >>>> Overall, also in this case it seems beneficial to use ASYM_CPU_CAPACITY
> >>>> rather than equalizing the capacities.
> >>>>
> >>>> That said, I'm still not sure why ASYM is helping. The frequency asymmetry
> >>>
> >>> OK, I still would be more comfortable with this when I would now why
> >>> this is :-)
> >>
> >> Working on this. :)
> >
> > Alright, I think I found something. I tried to make sis() behave more like sic()
> > by adding the same SMT "full idle core" check in the fast path and removing the
> > extra select_idle_smt(prev) hop from the LLC idle path.
> >
> > Essentially this:
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 7bebceb5ed9df..19fffa2df2d36 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7651,29 +7651,6 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
> > return -1;
> > }
> >
> > -/*
> > - * Scan the local SMT mask for idle CPUs.
> > - */
> > -static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> > -{
> > - int cpu;
> > -
> > - for_each_cpu_and(cpu, cpu_smt_mask(target), p->cpus_ptr) {
> > - if (cpu == target)
> > - continue;
> > - /*
> > - * Check if the CPU is in the LLC scheduling domain of @target.
> > - * Due to isolcpus, there is no guarantee that all the siblings are in the domain.
> > - */
> > - if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
> > - continue;
> > - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> > - return cpu;
>
> So it is this returning of CPU from the smt mask rather than the
>
> for_each_cpu_wrap(cpu, cpus, target + 1)
>
> __select_idle_cpu()
>
> if (choose_idle_cpu(cpu, p) && ...)
> return cpu
>
> where cpus is cpumask_and(cpus, sched_domain_span(MC), p->cpus_ptr)

Right, and this is a different behavior that I was trying to eliminate from
sis() to make it similar to sic().

>
> I wonder wether this has anything to do with your NVIDIA Spatial
> Multithreading (SMT) versus Traditional (time-shared resources) SMT?

I don't have data to prove or disprove that... it'd be interesting to try the
same approach on a system with traditional SMT.

>
>
> > - }
> > -
> > - return -1;
> > -}
> > -
> > #else /* !CONFIG_SCHED_SMT: */
> >
> > static inline void set_idle_cores(int cpu, int val)
> > @@ -7690,11 +7667,6 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
> > return __select_idle_cpu(core, p);
> > }
> >
> > -static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> > -{
> > - return -1;
> > -}
> > -
> > #endif /* !CONFIG_SCHED_SMT */
> >
> > /*
> > @@ -7859,7 +7831,7 @@ static inline bool asym_fits_cpu(unsigned long util,
> > (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> > }
> >
> > - return true;
> > + return !sched_smt_active() || is_core_idle(cpu);
> > }
>
> This change seems to be orthogonal to the removal of select_idle_smt()
> for sis()?

Right, essentially this modifies sis() to return only if cpu is a fully-idle
core.

>
> BTW, the is_core_idle() in asym_fits_cpu() (used for those early return
> CPU conditions in sis()) is something we don't have on the NO_ASYM side
> where we only use choose_idle_cpu().

You mean without this change? In that case, yes, because asym_fits_cpu() was
just a no-op. This is one of the behavior changes in sis() to make it similar to
sic() with SMT awareness.

>
> > /*
> > @@ -7964,16 +7936,9 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> > if (!sd)
> > return target;
> >
> > - if (sched_smt_active()) {
> > + if (sched_smt_active())
> > has_idle_core = test_idle_cores(target);
> >
> > - if (!has_idle_core && cpus_share_cache(prev, target)) {
> > - i = select_idle_smt(p, sd, prev);
> > - if ((unsigned int)i < nr_cpumask_bits)
> > - return i;
> > - }
> > - }
> > -
> > i = select_idle_cpu(p, sd, has_idle_core, target);
> > if ((unsigned)i < nr_cpumask_bits)
> > return i;
> >
> > ---
> >
> > With this applied, I see identical performance between NO_ASYM and ASYM+SMT.
>
> Interesting!
>
> > I'm not suggesting to apply this, but that seems to be the reason why ASYM+SMT
> > performs better in my case.
> >
> > -Andrea
>

Thanks,
-Andrea