Re: [PATCH v5 6/6] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters

From: Vincent Guittot

Date: Tue Jun 23 2026 - 03:28:52 EST


On Tue, 23 Jun 2026 at 01:55, Ricardo Neri
<ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
>
> Some topologies have scheduling domains that contain CPUs of asymmetric
> capacity, grouped into two or more clusters of equal-capacity CPUs
> sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be
> balanced across these clusters.
>
> Do not clear SD_PREFER_SIBLING in the child domains to indicate to the
> load balancer that it should spread load among cluster siblings.
>
> Checks for capacity in update_sd_pick_busiest(),
> sched_balance_find_src_group(), and sched_balance_find_src_rq() prevent
> migrations from high- to low-capacity CPUs if the busiest group is not
> overloaded.
>
> CPUs with spare capacity, big or small, have always helped overloaded
> groups. Once the overloading condition disappears, misfit load will still
> be used to move high-utilization tasks to bigger CPUs if they have spare
> capacity.
>
> Adding the SD_PREFER_SIBLING flag shifts load balancing in shared-LLC
> domains from equalizing the number of idle CPUs to equalizing the number
> of running tasks. This also enables migrations among clusters from newly-
> idle load balance, where the outgoing task is already dequeued but the CPU
> has not yet transitioned to idle.
>
> Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Tested-by: Christian Loehle <christian.loehle@xxxxxxx>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> ---
> Changes in v5:
> * Improved inline comments for accuracy.
> * Added Tested-by tag from Christian. Thanks!
>
> Changes in v4:
> * Added Reviewed-by tag from Tim. Thanks!
>
> Changes in v3:
> * Updated documentation of SD_PREFER_SIBLING.
> * Expanded the patch description to explain the behavior when overloaded
> groups are involved.
>
> Changes in v2:
> * Reworded the patch description for clarity.
> * Kept parentheses around bitwise operators for clarity.
> ---
> include/linux/sched/sd_flags.h | 3 ++-
> kernel/sched/topology.c | 14 ++++++++++++--
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> index 42839cfa2778..f9a46fb8cacf 100644
> --- a/include/linux/sched/sd_flags.h
> +++ b/include/linux/sched/sd_flags.h
> @@ -147,7 +147,8 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
> * Prefer to place tasks in a sibling domain
> *
> * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
> - * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
> + * flag, but cleared below domains with SD_ASYM_CPUCAPACITY unless those child
> + * domains have clusters of CPUs sharing cache.
> *
> * NEEDS_GROUPS: Load balancing flag.
> */
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 622e2e01974c..261b407d0936 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1995,8 +1995,18 @@ sd_init(struct sched_domain_topology_level *tl,
> /*
> * Convert topological properties into behaviour.
> */
> - /* Don't attempt to spread across CPUs of different capacities. */
> - if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
> + /*
> + * Don't attempt to spread across CPUs of different capacities.
> + *
> + * If the child domain has clusters of CPUs sharing L2 cache, keep the
> + * flag to spread tasks across clusters of identical capacity. Checks in
> + * the load balancer prevent task migrations from high- to low-capacity
> + * CPUs unless the source group is overloaded. Migrations to a lower-
> + * capacity CPU can happen if a higher-capacity group is overloaded and
> + * a lower-capacity CPU has spare capacity.
> + */
> + if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child &&
> + !(sd->child->flags & SD_CLUSTER))
> sd->child->flags &= ~SD_PREFER_SIBLING;

Last time I looked at this patch I was balanced between your proposal
above and simply keeping SD_PREFER_SIBLING for all HMP topologies. As
added in the comment:
" Checks in
* the load balancer prevent task migrations from high- to low-capacity
* CPUs unless the source group is overloaded.
"
So, why should we bother for (SD_ASYM_CPUCAPACITY && !SD_CLUSTER) topology ?

>
> if (sd->flags & SD_SHARE_CPUCAPACITY) {
>
> --
> 2.43.0
>