Re: [PATCH v5 5/6] sched/fair: Allow load balancing between CPUs of identical capacity
From: Vincent Guittot
Date: Fri Jun 26 2026 - 11:20:57 EST
On Tue, 23 Jun 2026 at 09:45, Christian Loehle <christian.loehle@xxxxxxx> wrote:
>
> On 6/23/26 08:20, Vincent Guittot wrote:
> > On Tue, 23 Jun 2026 at 01:55, Ricardo Neri
> > <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
> >>
> >> sched_balance_find_src_rq() avoids selecting a runqueue with a single
> >> running task as busiest if doing so results in migrating the task to a
> >> CPU with less than ~5% of extra capacity. It also unintentionally
> >> prevents migrations between CPUs of identical capacity.
> >>
> >> When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across
> >> clusters of CPUs with the same capacity. Allowing migration between CPUs
> >> of identical capacity is necessary to meet this goal.
> >>
> >> Use arch_scale_cpu_capacity() to reflect architectural capacity, excluding
> >
> > capacity_of() reflects not only RT and irq pressure but also thermal
> > pressure or system frequency capping.
> > If dst cluster is under thermal mitigation but the source cluster is
> > not, we probably shouldn't spread tasks across both clusters.
> > Have you considered using get_actual_cpu_capacity() instead of
> > arch_scale_cpu_capacity() ?
>
> Replacing arch_scale_cpu_capacity() with get_actual_cpu_capacity()
> would make the == comparison below very unlikely to be true FWIW.
Do you have in mind cpufreq_pressure or hw load_avg ?
> I think it's fine like that, I will prepare a follow-up anyway to make
> it work for our "almost equal capacity" cluster systems and then also
> consider switching to get_actual_cpu_capacity() since we include a margin
> anyway.
I would prefer the other way: Keep the current behavior correct (keep
accounting system pressure) before adding a new feature
>
> >
> >> runtime reductions due to side activity or thermal pressure. Guard this
> >> check with the sched_cluster_active static key so that systems without
> >> cluster topology are unaffected.
> >>
> >> Tested-by: Christian Loehle <christian.loehle@xxxxxxx>
> >> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> >> ---
> >> Changes in v5:
> >> * Optimized logic to identify same-arch clusters only when needed.
> >> * Added Tested-by tag from Christian. Thanks!
> >>
> >> Changes in v4:
> >> * Implemented the check for cluster with a local variable for improved
> >> readability.
> >>
> >> Changes in v3:
> >> * Reverted the inverted capacity check; the inverted form incorrectly
> >> allows migrations to CPUs of slightly less capacity.
> >> * Guarded the check for architectural capacity with the
> >> sched_cluster_active static key.
> >>
> >> Changes in v2:
> >> * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore
> >> runtime variability.
> >> * Inverted the check for runtime capacity. (Christian)
> >> * Reworded patch description for clarity.
> >> ---
> >> kernel/sched/fair.c | 9 ++++++++-
> >> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index e55eb019d2c9..f4eb55cad54d 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -12992,13 +12992,20 @@ static struct rq *sched_balance_find_src_rq(struct lb_env *env,
> >> */
> >> if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> >> nr_running == 1) {
> >> + bool same_arch_cluster = static_branch_unlikely(&sched_cluster_active) &&
> >> + (arch_scale_cpu_capacity(env->dst_cpu) ==
> >> + arch_scale_cpu_capacity(i));
> >> bool smt_degraded_cap = sched_smt_active() && !is_core_idle(i);
> >>
> >> /*
> >> * Busy SMT siblings reduce the capacity of CPU @i. Do
> >> * not skip it in this case.
> >> + *
> >> + * CONFIG_SCHED_CLUSTER requires balancing load across clusters
> >> + * of identical capacity. Use architectural capacity to ignore
> >> + * runtime variability.
> >> */
> >> - if (!smt_degraded_cap &&
> >> + if (!smt_degraded_cap && !same_arch_cluster &&
> >> !capacity_greater(capacity_of(env->dst_cpu), capacity))
> >> continue;
> >> }
> >>
> >> --
> >> 2.43.0
> >>
>