Re: [PATCH v5 6/6] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters
From: Ricardo Neri
Date: Wed Jun 24 2026 - 01:05:44 EST
On Tue, Jun 23, 2026 at 09:26:57AM +0200, Vincent Guittot wrote:
> On Tue, 23 Jun 2026 at 01:55, Ricardo Neri
> <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
> >
> > Some topologies have scheduling domains that contain CPUs of asymmetric
> > capacity, grouped into two or more clusters of equal-capacity CPUs
> > sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be
> > balanced across these clusters.
> >
> > Do not clear SD_PREFER_SIBLING in the child domains to indicate to the
> > load balancer that it should spread load among cluster siblings.
> >
> > Checks for capacity in update_sd_pick_busiest(),
> > sched_balance_find_src_group(), and sched_balance_find_src_rq() prevent
> > migrations from high- to low-capacity CPUs if the busiest group is not
> > overloaded.
> >
> > CPUs with spare capacity, big or small, have always helped overloaded
> > groups. Once the overloading condition disappears, misfit load will still
> > be used to move high-utilization tasks to bigger CPUs if they have spare
> > capacity.
> >
> > Adding the SD_PREFER_SIBLING flag shifts load balancing in shared-LLC
> > domains from equalizing the number of idle CPUs to equalizing the number
> > of running tasks. This also enables migrations among clusters from newly-
> > idle load balance, where the outgoing task is already dequeued but the CPU
> > has not yet transitioned to idle.
> >
> > Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> > Tested-by: Christian Loehle <christian.loehle@xxxxxxx>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > ---
> > Changes in v5:
> > * Improved inline comments for accuracy.
> > * Added Tested-by tag from Christian. Thanks!
> >
> > Changes in v4:
> > * Added Reviewed-by tag from Tim. Thanks!
> >
> > Changes in v3:
> > * Updated documentation of SD_PREFER_SIBLING.
> > * Expanded the patch description to explain the behavior when overloaded
> > groups are involved.
> >
> > Changes in v2:
> > * Reworded the patch description for clarity.
> > * Kept parentheses around bitwise operators for clarity.
> > ---
> > include/linux/sched/sd_flags.h | 3 ++-
> > kernel/sched/topology.c | 14 ++++++++++++--
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> > index 42839cfa2778..f9a46fb8cacf 100644
> > --- a/include/linux/sched/sd_flags.h
> > +++ b/include/linux/sched/sd_flags.h
> > @@ -147,7 +147,8 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
> > * Prefer to place tasks in a sibling domain
> > *
> > * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
> > - * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
> > + * flag, but cleared below domains with SD_ASYM_CPUCAPACITY unless those child
> > + * domains have clusters of CPUs sharing cache.
> > *
> > * NEEDS_GROUPS: Load balancing flag.
> > */
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 622e2e01974c..261b407d0936 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -1995,8 +1995,18 @@ sd_init(struct sched_domain_topology_level *tl,
> > /*
> > * Convert topological properties into behaviour.
> > */
> > - /* Don't attempt to spread across CPUs of different capacities. */
> > - if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
> > + /*
> > + * Don't attempt to spread across CPUs of different capacities.
> > + *
> > + * If the child domain has clusters of CPUs sharing L2 cache, keep the
> > + * flag to spread tasks across clusters of identical capacity. Checks in
> > + * the load balancer prevent task migrations from high- to low-capacity
> > + * CPUs unless the source group is overloaded. Migrations to a lower-
> > + * capacity CPU can happen if a higher-capacity group is overloaded and
> > + * a lower-capacity CPU has spare capacity.
> > + */
> > + if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child &&
> > + !(sd->child->flags & SD_CLUSTER))
> > sd->child->flags &= ~SD_PREFER_SIBLING;
>
> Last time I looked at this patch I was balanced between your proposal
> above and simply keeping SD_PREFER_SIBLING for all HMP topologies. As
> added in the comment:
> " Checks in
> * the load balancer prevent task migrations from high- to low-capacity
> * CPUs unless the source group is overloaded.
> "
> So, why should we bother for (SD_ASYM_CPUCAPACITY && !SD_CLUSTER) topology ?
No reason, AFAICS. I just wanted to restrict the change to the target
topology of this patchset.
But you raise a good point: given the checks in place in the load balancer,
it should be OK to keep SD_PREFER_SIBLING in all asymmetric topologies. I
will run a few experiments to confirm.