Re: [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
From: Andrea Righi
Date: Mon May 11 2026 - 09:49:08 EST
Hi Vincent,
On Mon, May 11, 2026 at 03:07:50PM +0200, Vincent Guittot wrote:
> On Sat, 9 May 2026 at 20:10, Andrea Righi <arighi@xxxxxxxxxx> wrote:
...
> > +/*
> > + * Idle-capacity scan converts util_fits_cpu() outcomes into preference ranks,
> > + * where lower values indicate a better fit - see select_idle_capacity().
> > + *
> > + * A CPU that both fits the task and sits on a fully-idle SMT core is returned
> > + * immediately and is never assigned one of these ranks. On !SMT every CPU is
> > + * its own "core", so the early return covers all fits-and-idle cases and the
> > + * core-tier ranks below become unreachable.
> > + *
> > + * Rank Val Tier Meaning
> > + * ------------------------------ --- ------ ---------------------------
> > + * ASYM_IDLE_CORE_UCLAMP_MISFIT -4 core Idle core; capacity fits
> > + * util but uclamp_min misses.
> > + * ASYM_IDLE_CORE_COMPLETE_MISFIT -3 core Idle core; capacity does
> > + * not fit. Still beats every
> > + * thread-tier rank: a busy
> > + * sibling cuts effective
> > + * capacity more than a
> > + * misfit hurts a quiet core.
> > + * ASYM_IDLE_THREAD_FITS -2 thread Busy SMT sibling; capacity
> > + * fits util + uclamp.
> > + * ASYM_IDLE_THREAD_UCLAMP_MISFIT -1 thread Busy SMT sibling; capacity
> > + * fits but uclamp_min misses
> > + * (native util_fits_cpu()
> > + * return value).
> > + * ASYM_IDLE_COMPLETE_MISFIT 0 thread Busy SMT sibling; capacity
> > + * does not fit.
> > + *
> > + * ASYM_IDLE_CORE_BIAS (-3) is an offset, not a state. On an idle core,
> > + * fits += ASYM_IDLE_CORE_BIAS rebases thread-tier ranks into the core tier:
> > + *
> > + * ASYM_IDLE_THREAD_UCLAMP_MISFIT (-1) + BIAS -> CORE_UCLAMP_MISFIT (-4)
> > + * ASYM_IDLE_COMPLETE_MISFIT (0) + BIAS -> CORE_COMPLETE_MISFIT (-3)
> > + *
> > + * ASYM_IDLE_THREAD_FITS (-2) is never rebased because a fully-fitting idle-core
> > + * candidate early-returns from select_idle_capacity().
> > + */
> > +enum asym_fits_state {
> > + ASYM_IDLE_CORE_UCLAMP_MISFIT = -4,
>
> ASYM_IDLE_UCLAMP_MISFIT
> See why in comments for select_idle_capacity()
>
> > + ASYM_IDLE_CORE_COMPLETE_MISFIT,
>
> ASYM_IDLE_COMPLETE_MISFIT,
>
> > + ASYM_IDLE_THREAD_FITS,
> > + ASYM_IDLE_THREAD_UCLAMP_MISFIT,
> > + ASYM_IDLE_COMPLETE_MISFIT,
>
> ASYM_IDLE_THREAD_MISFIT,
>
> > +
> > + /* util_fits_cpu() bias for idle core */
> > + ASYM_IDLE_CORE_BIAS = -3,
> > +};
> > +
> > /*
> > * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> > * the task fits. If no CPU is big enough, but there are idle ones, try to
> > @@ -8026,8 +8074,14 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > static int
> > select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > {
> > + /*
> > + * On !SMT systems, has_idle_core is always false and preferred_core
> > + * is always true (CPU == core), so the SMT preference logic below
> > + * collapses to the plain capacity scan.
> > + */
> > + bool has_idle_core = sched_smt_active() && test_idle_cores(target);
> > unsigned long task_util, util_min, util_max, best_cap = 0;
> > - int fits, best_fits = 0;
> > + int fits, best_fits = ASYM_IDLE_COMPLETE_MISFIT;
> > int cpu, best_cpu = -1;
> > struct cpumask *cpus;
> >
> > @@ -8039,6 +8093,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > util_max = uclamp_eff_value(p, UCLAMP_MAX);
> >
> > for_each_cpu_wrap(cpu, cpus, target) {
> > + bool preferred_core = !has_idle_core || is_core_idle(cpu);
>
> If sched_smt_active() is true and test_idle_cores(target) is false
> (meaning we have SMT but no idle core), then has_idle_core is false
> and preferred_core is true. We will returns immediatly if
> util_fits_cpu and we will use the ASYM_IDLE_CORE_* values otherwise.
> So I think that we should remove the "CORE_" in the naming
>
> ASYM_IDLE_THREAD_* values are only used when we are promised to find
> an idle core with SMT
Yes, I agree, the CORE_ prefix is just misleading, those ranks can be assigned
also when sched_smt_active() && !test_idle_cores(target). I'll send an updated
patch with your naming schema.
Thanks,
-Andrea