Re: [PATCH] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
From: Vincent Guittot
Date: Thu Mar 19 2026 - 03:17:29 EST
On Wed, 18 Mar 2026 at 11:31, Andrea Righi <arighi@xxxxxxxxxx> wrote:
>
> Hi Vincent,
>
> On Wed, Mar 18, 2026 at 10:41:15AM +0100, Vincent Guittot wrote:
> > On Wed, 18 Mar 2026 at 10:22, Andrea Righi <arighi@xxxxxxxxxx> wrote:
> > >
> > > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > > different per-core frequencies), the wakeup path uses
> > > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > > for better task placement. However, when those CPUs belong to SMT cores,
> >
> > Interesting, which kind of system has both SMT and SD_ASYM_CPUCAPACITY
> > ? I thought both were never set simultaneously and SD_ASYM_PACKING was
> > used for system involving SMT like x86
>
> It's an NVIDIA platform (not publicly available yet), where the firmware
> exposes different CPU capacities and has SMT enabled, so both
> SD_ASYM_CPUCAPACITY and SMT are present. I'm not sure whether the final
> firmware release will keep this exact configuration (there's a good chance
> it will), so I'm targeting it to be prepared.
That's probably not the only place where SD_ASYM_CPUCAPACITY will fail
with SMT. The misfit is another place as an example
>
> >
> > > their effective capacity can be much lower than the nominal capacity
> > > when the sibling thread is busy: SMT siblings compete for shared
> > > resources, so a "high capacity" CPU that is idle but whose sibling is
> > > busy does not deliver its full capacity. This effective capacity
> > > reduction cannot be modeled by the static capacity value alone.
> > >
> > > Introduce SMT awareness in the asym-capacity idle selection policy: when
> > > SMT is active prefer fully-idle SMT cores over partially-idle ones. A
> > > two-phase selection first tries only CPUs on fully idle cores, then
> > > falls back to any idle CPU if none fit.
> > >
> > > Prioritizing fully-idle SMT cores yields better task placement because
> > > the effective capacity of partially-idle SMT cores is reduced; always
> > > preferring them when available leads to more accurate capacity usage on
> > > task wakeup.
> > >
> > > On an SMT system with asymmetric CPU capacities, SMT-aware idle
> > > selection has been shown to improve throughput by around 15-18% for
> > > CPU-bound workloads, running an amount of tasks equal to the amount of
> > > SMT cores.
> > >
> > > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>
> > > ---
> > > kernel/sched/fair.c | 24 +++++++++++++++++++++---
> > > 1 file changed, 21 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 0a35a82e47920..0f97c44d4606b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -7945,9 +7945,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > > * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> > > * the task fits. If no CPU is big enough, but there are idle ones, try to
> > > * maximize capacity.
> > > + *
> > > + * When @smt_idle_only is true (asym + SMT), only consider CPUs on cores whose
> > > + * SMT siblings are all idle, to avoid stacking and sharing SMT resources.
> > > */
> > > static int
> > > -select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > > +select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target,
> > > + bool smt_idle_only)
> > > {
> > > unsigned long task_util, util_min, util_max, best_cap = 0;
> > > int fits, best_fits = 0;
> > > @@ -7967,6 +7971,9 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > > if (!choose_idle_cpu(cpu, p))
> > > continue;
> > >
> > > + if (smt_idle_only && !is_core_idle(cpu))
> > > + continue;
> > > +
> > > fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> > >
> > > /* This CPU fits with all requirements */
> > > @@ -8102,8 +8109,19 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> > > * capacity path.
> > > */
> > > if (sd) {
> > > - i = select_idle_capacity(p, sd, target);
> > > - return ((unsigned)i < nr_cpumask_bits) ? i : target;
> > > + /*
> > > + * When asym + SMT and the hint says idle cores exist,
> > > + * try idle cores first to avoid stacking on SMT; else
> > > + * scan all idle CPUs.
> > > + */
> > > + if (sched_smt_active() && test_idle_cores(target)) {
> > > + i = select_idle_capacity(p, sd, target, true);
> > > + if ((unsigned int)i >= nr_cpumask_bits)
> > > + i = select_idle_capacity(p, sd, target, false);
> >
> > Can't you make it one pass in select_idle_capacity ?
>
> Oh yes, absolutely, we can select the best-fit CPU in the same pass and use
> it as a fallback if we can't find any fully-idle SMT CPU. I'll change that.
>
> >
> > > + } else {
> > > + i = select_idle_capacity(p, sd, target, false);
> > > + }
> > > + return ((unsigned int)i < nr_cpumask_bits) ? i : target;
> > > }
> > > }
> > >
> > > --
> > > 2.53.0
> > >
>
> Thanks,
> -Andrea