Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity

From: Dietmar Eggemann

Date: Tue Apr 07 2026 - 07:52:22 EST

On 03.04.26 22:44, Andrea Righi wrote:
> On Fri, Apr 03, 2026 at 04:46:03PM +0200, Andrea Righi wrote:
>> On Fri, Apr 03, 2026 at 01:47:17PM +0200, Dietmar Eggemann wrote:
> ...
>>>> Looking at the data:
>>>> - SIS_UTIL doesn't seem relevant in this case (differences are within
>>>> error range),
>>>> - ASYM_CPU_CAPACITY seems to provide a small throughput gain, but it seems
>>>> more beneficial for tail latency reduction,
>>>> - the ILB SMT patch seems to slightly improve throughput, but the biggest
>>>> benefit is still coming from ASYM_CPU_CAPACITY.
>>>
>>>> Overall, also in this case it seems beneficial to use ASYM_CPU_CAPACITY
>>>> rather than equalizing the capacities.
>>>>
>>>> That said, I'm still not sure why ASYM is helping. The frequency asymmetry
>>>
>>> OK, I still would be more comfortable with this when I would now why
>>> this is :-)
>>
>> Working on this. :)
>
> Alright, I think I found something. I tried to make sis() behave more like sic()
> by adding the same SMT "full idle core" check in the fast path and removing the
> extra select_idle_smt(prev) hop from the LLC idle path.
>
> Essentially this:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7bebceb5ed9df..19fffa2df2d36 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7651,29 +7651,6 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
> return -1;
> }
>
> -/*
> - * Scan the local SMT mask for idle CPUs.
> - */
> -static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> -{
> - int cpu;
> -
> - for_each_cpu_and(cpu, cpu_smt_mask(target), p->cpus_ptr) {
> - if (cpu == target)
> - continue;
> - /*
> - * Check if the CPU is in the LLC scheduling domain of @target.
> - * Due to isolcpus, there is no guarantee that all the siblings are in the domain.
> - */
> - if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
> - continue;
> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> - return cpu;

So it is this returning of CPU from the smt mask rather than the

for_each_cpu_wrap(cpu, cpus, target + 1)

__select_idle_cpu()

if (choose_idle_cpu(cpu, p) && ...)
return cpu

where cpus is cpumask_and(cpus, sched_domain_span(MC), p->cpus_ptr)

I wonder wether this has anything to do with your NVIDIA Spatial
Multithreading (SMT) versus Traditional (time-shared resources) SMT?

> - }
> -
> - return -1;
> -}
> -
> #else /* !CONFIG_SCHED_SMT: */
>
> static inline void set_idle_cores(int cpu, int val)
> @@ -7690,11 +7667,6 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
> return __select_idle_cpu(core, p);
> }
>
> -static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> -{
> - return -1;
> -}
> -
> #endif /* !CONFIG_SCHED_SMT */
>
> /*
> @@ -7859,7 +7831,7 @@ static inline bool asym_fits_cpu(unsigned long util,
> (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> }
>
> - return true;
> + return !sched_smt_active() || is_core_idle(cpu);
> }

This change seems to be orthogonal to the removal of select_idle_smt()
for sis()?

BTW, the is_core_idle() in asym_fits_cpu() (used for those early return
CPU conditions in sis()) is something we don't have on the NO_ASYM side
where we only use choose_idle_cpu().

> /*
> @@ -7964,16 +7936,9 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> if (!sd)
> return target;
>
> - if (sched_smt_active()) {
> + if (sched_smt_active())
> has_idle_core = test_idle_cores(target);
>
> - if (!has_idle_core && cpus_share_cache(prev, target)) {
> - i = select_idle_smt(p, sd, prev);
> - if ((unsigned int)i < nr_cpumask_bits)
> - return i;
> - }
> - }
> -
> i = select_idle_cpu(p, sd, has_idle_core, target);
> if ((unsigned)i < nr_cpumask_bits)
> return i;
>
> ---
>
> With this applied, I see identical performance between NO_ASYM and ASYM+SMT.

Interesting!

> I'm not suggesting to apply this, but that seems to be the reason why ASYM+SMT
> performs better in my case.
>
> -Andrea