[RFC PATCH 2/2] sched/fair: Do not specialcase SCHED_IDLE cpus in select slowpath

From: Abel Wu
Date: Mon Mar 10 2025 - 03:41:57 EST


The SCHED_IDLE cgroups whose cpu.idle equals to 1, only mean something
to their siblings due to cgroup hierarchical behavior. So a SCHED_IDLE
cpu does NOT necessarily implies any of the following:

- It is a less loaded cpu (since the parent of its topmost idle
ancestor could be a 'giant' entity with large cpu.weight).

- It can be expected to be preempted by a newly woken task soon
enough (which actually depends on their ancestors who have
common parent).

As a less loaded cpu probably has better ability to serve the newly
woken task, which also applies to the SCHED_IDLE cpus that less loaded
SCHED_IDLE cpu might be easier and faster preempted, let's not special
case SCHED_IDLE cpus at least in slowpath when selecting.

Signed-off-by: Abel Wu <wuyun.abel@xxxxxxxxxxxxx>
---
kernel/sched/fair.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 379764bd2795..769505cf519b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7446,7 +7446,7 @@ sched_balance_find_dst_group_cpu(struct sched_group *group, struct task_struct *
unsigned int min_exit_latency = UINT_MAX;
u64 latest_idle_timestamp = 0;
int least_loaded_cpu = this_cpu;
- int shallowest_idle_cpu = -1, si_cpu = -1;
+ int shallowest_idle_cpu = -1;
int i;

/* Check if we have any choice: */
@@ -7481,12 +7481,13 @@ sched_balance_find_dst_group_cpu(struct sched_group *group, struct task_struct *
latest_idle_timestamp = rq->idle_stamp;
shallowest_idle_cpu = i;
}
- } else if (shallowest_idle_cpu == -1 && si_cpu == -1) {
- if (sched_idle_cpu(i)) {
- si_cpu = i;
- continue;
- }
-
+ } else if (shallowest_idle_cpu == -1) {
+ /*
+ * The SCHED_IDLE cpus do not necessarily means anything
+ * to @p due to the cgroup hierarchical behavior. But it
+ * is almost certain that the wakee will get better served
+ * if the cpu is less loaded.
+ */
load = cpu_load(cpu_rq(i));
if (load < min_load) {
min_load = load;
@@ -7495,11 +7496,7 @@ sched_balance_find_dst_group_cpu(struct sched_group *group, struct task_struct *
}
}

- if (shallowest_idle_cpu != -1)
- return shallowest_idle_cpu;
- if (si_cpu != -1)
- return si_cpu;
- return least_loaded_cpu;
+ return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
}

static inline int sched_balance_find_dst_cpu(struct sched_domain *sd, struct task_struct *p,
--
2.37.3