Re: [PATCH 2/2] sched/fair: Balance #Tasks/#CPUs if busiest group has no idle CPU

From: K Prateek Nayak

Date: Fri Feb 06 2026 - 04:47:32 EST


Hello Pierre,

On 2/5/2026 8:38 PM, Pierre Gondois wrote:
> Halving the imbalance currently lead to the following scenario.
> On a Juno with 2 clusters: CLU0: 4 CPUs and CLU1: 2 CPUs, with
> 6 long running tasks:
> - 1 task on the 2-CPUs cluster
> - 5 Tasks run in the 4-CPUs cluster
> Running the load balancer from the idle CPU (in CLU1):
> - Local group: CLU1: idle_cpus=1; nr_running=1; type=group_has_spare
> - Busiest group: CLU0 idle_cpus=0; nr_running=5 type=group_overloaded
> Half of (local->idle_cpus - busiest->idle_cpus) is 0.
> No task is migrated and the task placement persists.

...

> ---
> kernel/sched/fair.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index aa14a9982b9f1..9dac3536d9c19 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11235,20 +11235,18 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> return;
> }
>
> - if (busiest->group_weight == 1 || sds->prefer_sibling) {
> + env->migration_type = migrate_task;
> + if (busiest->group_weight == 1 || sds->prefer_sibling || !busiest->idle_cpus) {

I suppose you also have SD_ASYM_CPUCAPACITY set on your sd which is why
"sds->prefer_sibling" is false here.

Instead of checking for "busiest->idle_cpus", would it make sense to
enter this case for sibling_imbalance() when we have:

capacity_greater(capacity_of(env->dst_cpu), sds->busiest->sgc->min_capacity)

since it could very well be the case that the smaller cluster is
actually idle since task_fits_cpu() returned false for CPUs there?

I couldn't actually spot any case where we compare the capacities
of local and busiest group for <= fully_loaded but let me know if
I've missed something.

> /*
> - * When prefer sibling, evenly spread running tasks on
> - * groups.
> + * When prefer sibling, or when busiest has no idle CPU,
> + * evenly spread running tasks on groups.
> */
> - env->migration_type = migrate_task;
> env->imbalance = sibling_imbalance(env, sds, busiest, local);

I'm slightly skeptical of spreading the tasks evenly without considering
the capacity difference when we are on SD_ASYM_CPUCAPACITY. I suppose
we'll filter out the target in sched_balance_find_src_rq() and bail out
if we have only see lower capacity CPUs on the busiest group.

> } else {
> -
> /*
> * If there is no overload, we just want to even the number of
> * idle CPUs.
> */
> - env->migration_type = migrate_task;
> env->imbalance = local->idle_cpus;
> lsub_positive(&env->imbalance, busiest->idle_cpus);
> }

--
Thanks and Regards,
Prateek