Re: [PATCH v2 3/4] sched/fair: Allow load balancing between CPUs of identical capacity

From: Ricardo Neri

Date: Fri May 08 2026 - 08:46:27 EST


On Wed, May 06, 2026 at 02:10:22PM +0100, Christian Loehle wrote:
> On 4/29/26 22:19, Ricardo Neri wrote:
> > sched_balance_find_src_rq() avoids selecting a runqueue with a single
> > running task as busiest if doing so results in migrating the task to a
> > CPU with less than ~5% of extra capacity. It also unintentionally
> > prevents migrations between CPUs of identical capacity.
> >
> > When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across
> > clusters of CPUs with the same capacity. Allowing migration between CPUs
> > of identical capacity is necessary to meet this goal.
> >
> > We are interested in the architectural capacity of the involved CPUs,
> > excluding any reductions due to side activity or thermal pressure. Use
> > arch_scale_cpu_capacity().
> >
> > While here, invert the check for runtime capacity for clarity.
> >
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > ---
> > Changes since v1:
> > * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore
> > runtime variability.
> > * Inverted the check for runtime capacity. (Christian)
> > * Reworded patch description for clarity.
> > ---
> > kernel/sched/fair.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 166a5b109e0e..4105717e64fe 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -11816,9 +11816,14 @@ static struct rq *sched_balance_find_src_rq(struct lb_env *env,
> > * eventually lead to active_balancing high->low capacity.
> > * Higher per-CPU capacity is considered better than balancing
> > * average load.
> > + *
> > + * Cluster scheduling requires balancing load across clusters
> > + * of identical capacity. Use architectural capacity to ignore
> > + * runtime variability.
> > */
> > if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> > - !capacity_greater(capacity_of(env->dst_cpu), capacity) &&
> > + arch_scale_cpu_capacity(env->dst_cpu) != arch_scale_cpu_capacity(i) &&
> > + capacity_greater(capacity, capacity_of(env->dst_cpu)) &&
> > nr_running == 1)
> > continue;
> >
> >
>
> I wonder if we shouldn't use capacity_greater() margin for both, i.e.
> capacity_greater(arch_scale_cpu_capacity(i), arch_scale_cpu_capacity(env->dst_cpu)) &&
>
> For example the orion o6 has a cluster with 1024 and one with 984, If we allow balancing
> 984->984 I think it's only consistent to also allow 984->1024.

But that would be a change in the current policy, no? Today we allow a 984->
1024 balance based on runtime capacity. The scope of this patchset is to make
SCHED_CLUSTER work as expected for clusters of same capacity.

Perhaps your proposal of using architectural capacity can be evaluated in a
separate patchset?

By the way, in v3 I will have to undo the inversion of the runtime capacity.
The original check allowed balance if dst_cpu had at least 5% more capacity
than src_cpu. The inverted check allows balance to CPUs of less capacity if
the difference is less than 5%.