Re: [PATCH v5 5/6] sched/fair: Allow load balancing between CPUs of identical capacity
From: Vincent Guittot
Date: Tue Jun 23 2026 - 03:22:12 EST
On Tue, 23 Jun 2026 at 01:55, Ricardo Neri
<ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
>
> sched_balance_find_src_rq() avoids selecting a runqueue with a single
> running task as busiest if doing so results in migrating the task to a
> CPU with less than ~5% of extra capacity. It also unintentionally
> prevents migrations between CPUs of identical capacity.
>
> When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across
> clusters of CPUs with the same capacity. Allowing migration between CPUs
> of identical capacity is necessary to meet this goal.
>
> Use arch_scale_cpu_capacity() to reflect architectural capacity, excluding
capacity_of() reflects not only RT and irq pressure but also thermal
pressure or system frequency capping.
If dst cluster is under thermal mitigation but the source cluster is
not, we probably shouldn't spread tasks across both clusters.
Have you considered using get_actual_cpu_capacity() instead of
arch_scale_cpu_capacity() ?
> runtime reductions due to side activity or thermal pressure. Guard this
> check with the sched_cluster_active static key so that systems without
> cluster topology are unaffected.
>
> Tested-by: Christian Loehle <christian.loehle@xxxxxxx>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> ---
> Changes in v5:
> * Optimized logic to identify same-arch clusters only when needed.
> * Added Tested-by tag from Christian. Thanks!
>
> Changes in v4:
> * Implemented the check for cluster with a local variable for improved
> readability.
>
> Changes in v3:
> * Reverted the inverted capacity check; the inverted form incorrectly
> allows migrations to CPUs of slightly less capacity.
> * Guarded the check for architectural capacity with the
> sched_cluster_active static key.
>
> Changes in v2:
> * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore
> runtime variability.
> * Inverted the check for runtime capacity. (Christian)
> * Reworded patch description for clarity.
> ---
> kernel/sched/fair.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e55eb019d2c9..f4eb55cad54d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12992,13 +12992,20 @@ static struct rq *sched_balance_find_src_rq(struct lb_env *env,
> */
> if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> nr_running == 1) {
> + bool same_arch_cluster = static_branch_unlikely(&sched_cluster_active) &&
> + (arch_scale_cpu_capacity(env->dst_cpu) ==
> + arch_scale_cpu_capacity(i));
> bool smt_degraded_cap = sched_smt_active() && !is_core_idle(i);
>
> /*
> * Busy SMT siblings reduce the capacity of CPU @i. Do
> * not skip it in this case.
> + *
> + * CONFIG_SCHED_CLUSTER requires balancing load across clusters
> + * of identical capacity. Use architectural capacity to ignore
> + * runtime variability.
> */
> - if (!smt_degraded_cap &&
> + if (!smt_degraded_cap && !same_arch_cluster &&
> !capacity_greater(capacity_of(env->dst_cpu), capacity))
> continue;
> }
>
> --
> 2.43.0
>