[PATCH v3 4/4] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters
From: Ricardo Neri
Date: Thu May 14 2026 - 14:25:21 EST
Some topologies have scheduling domains that contain CPUs of asymmetric
capacity, grouped into two or more clusters of equal-capacity CPUs
sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be
balanced across these resource-sharing clusters.
Do not clear SD_PREFER_SIBLING in the child domains to indicate to the
load balancer that it should spread load among cluster siblings.
Checks for capacity in update_sd_pick_busiest() prevent migrations from
high- to low-capacity CPUs if a candidate group is not overloaded.
An effect of keeping the SD_PREFER_SIBLING in domains with asymmetric
capacity is that low-capacity clusters with spare capacity can now help
overloaded higher-capacity groups. This was already the case for single-CPU
groups (see calculate_imbalance() for domains with SD_SHARE_LLC).
Once the overloading condition disappears, misfit load will still be used
to move high-utilization tasks to bigger CPUs if they have spare capacity.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
---
Changes in v3:
* Updated documentation of SD_PREFER_SIBLING.
* Expanded the patch description to explain the behavior when overloaded
groups are involved.
Changes in v2:
* Reworded the patch description for clarity.
* Kept parentheses around bitwise operators for clarity.
---
include/linux/sched/sd_flags.h | 3 ++-
kernel/sched/topology.c | 14 ++++++++++++--
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 42839cfa2778..42f74af83b8c 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -147,7 +147,8 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
* Prefer to place tasks in a sibling domain
*
* Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
- * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ * flag, but cleared below domains with SD_ASYM_CPUCAPACITY if the domain does
+ * not have clusters of CPUs sharing cache.
*
* NEEDS_GROUPS: Load balancing flag.
*/
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5847b83d9d55..a1d048344ea1 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1723,8 +1723,18 @@ sd_init(struct sched_domain_topology_level *tl,
/*
* Convert topological properties into behaviour.
*/
- /* Don't attempt to spread across CPUs of different capacities. */
- if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
+ /*
+ * Don't attempt to spread across CPUs of different capacities.
+ *
+ * If the domain has clusters of CPUs sharing L2 cache, keep the flag to
+ * spread tasks across clusters of identical capacity. Checks in
+ * update_sd_pick_busiest() prevent task migrations from high- to low-
+ * capacity CPUs for non-overloaded groups. Migrations to a lower-
+ * capacity CPU can happen if a higher-capacity group is overloaded and
+ * a low-capacity cluster has spare capacity.
+ */
+ if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child &&
+ !(sd->child->flags & SD_CLUSTER))
sd->child->flags &= ~SD_PREFER_SIBLING;
if (sd->flags & SD_SHARE_CPUCAPACITY) {
--
2.43.0