[PATCH v5 6/6] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters
From: Ricardo Neri
Date: Mon Jun 22 2026 - 19:56:07 EST
Some topologies have scheduling domains that contain CPUs of asymmetric
capacity, grouped into two or more clusters of equal-capacity CPUs
sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be
balanced across these clusters.
Do not clear SD_PREFER_SIBLING in the child domains to indicate to the
load balancer that it should spread load among cluster siblings.
Checks for capacity in update_sd_pick_busiest(),
sched_balance_find_src_group(), and sched_balance_find_src_rq() prevent
migrations from high- to low-capacity CPUs if the busiest group is not
overloaded.
CPUs with spare capacity, big or small, have always helped overloaded
groups. Once the overloading condition disappears, misfit load will still
be used to move high-utilization tasks to bigger CPUs if they have spare
capacity.
Adding the SD_PREFER_SIBLING flag shifts load balancing in shared-LLC
domains from equalizing the number of idle CPUs to equalizing the number
of running tasks. This also enables migrations among clusters from newly-
idle load balance, where the outgoing task is already dequeued but the CPU
has not yet transitioned to idle.
Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Tested-by: Christian Loehle <christian.loehle@xxxxxxx>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
---
Changes in v5:
* Improved inline comments for accuracy.
* Added Tested-by tag from Christian. Thanks!
Changes in v4:
* Added Reviewed-by tag from Tim. Thanks!
Changes in v3:
* Updated documentation of SD_PREFER_SIBLING.
* Expanded the patch description to explain the behavior when overloaded
groups are involved.
Changes in v2:
* Reworded the patch description for clarity.
* Kept parentheses around bitwise operators for clarity.
---
include/linux/sched/sd_flags.h | 3 ++-
kernel/sched/topology.c | 14 ++++++++++++--
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 42839cfa2778..f9a46fb8cacf 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -147,7 +147,8 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
* Prefer to place tasks in a sibling domain
*
* Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
- * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ * flag, but cleared below domains with SD_ASYM_CPUCAPACITY unless those child
+ * domains have clusters of CPUs sharing cache.
*
* NEEDS_GROUPS: Load balancing flag.
*/
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 622e2e01974c..261b407d0936 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1995,8 +1995,18 @@ sd_init(struct sched_domain_topology_level *tl,
/*
* Convert topological properties into behaviour.
*/
- /* Don't attempt to spread across CPUs of different capacities. */
- if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
+ /*
+ * Don't attempt to spread across CPUs of different capacities.
+ *
+ * If the child domain has clusters of CPUs sharing L2 cache, keep the
+ * flag to spread tasks across clusters of identical capacity. Checks in
+ * the load balancer prevent task migrations from high- to low-capacity
+ * CPUs unless the source group is overloaded. Migrations to a lower-
+ * capacity CPU can happen if a higher-capacity group is overloaded and
+ * a lower-capacity CPU has spare capacity.
+ */
+ if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child &&
+ !(sd->child->flags & SD_CLUSTER))
sd->child->flags &= ~SD_PREFER_SIBLING;
if (sd->flags & SD_SHARE_CPUCAPACITY) {
--
2.43.0