sched_ext/for-6.11: cpu validity check in ops_cpu_valid

From: Vishal Chourasia
Date: Sat Jul 13 2024 - 15:14:47 EST


Currently, the BPF scheduler can return a CPU that is marked as possible
in the system configurations, but this doesn't guarantee that the CPU is
actually present or online at the time. This behavior can lead to
scenarios where the scheduler attempts to assign tasks to CPUs that are
not available, causing the fallback mechanism to activate and
potentially leading to an uneven load distribution across the system.

By defalut, When a "not possible" CPU is returned, sched_ext gracefully
exits the bpf scheduler.

static bool ops_cpu_valid(s32 cpu, const char *where)
{
if (likely(cpu >= 0 && cpu < nr_cpu_ids && cpu_possible(cpu))) {
return true;
} else {
scx_ops_error("invalid CPU %d%s%s", cpu,
where ? " " : "", where ?: "");
return false;
}
}

On POWER, a system can have differences in cpu_present and cpu_possible
mask. Not present, but possible CPUs can be added later but once added
will also be marked set in the cpu present mask.

Looks like cpu_present() is a better check.

# tail -n +1 /sys/devices/system/cpu/{possible,present,online,offline}
==> /sys/devices/system/cpu/possible <==
0-63

==> /sys/devices/system/cpu/present <==
0-31

==> /sys/devices/system/cpu/online <==
0-31

==> /sys/devices/system/cpu/offline <==
32-63


diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 03da2cecb547..ca36596176c5 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1333,7 +1333,7 @@ static void wait_ops_state(struct task_struct *p, unsigned long opss)
*/
static bool ops_cpu_valid(s32 cpu, const char *where)
{
- if (likely(cpu >= 0 && cpu < nr_cpu_ids && cpu_possible(cpu))) {
+ if (likely(cpu >= 0 && cpu < nr_cpu_ids && cpu_present(cpu))) {
return true;
} else {
scx_ops_error("invalid CPU %d%s%s", cpu,

Note: With this, when the BPF scheduler erroneously assigns a task to an
offline CPU, it doesn't stop. Instead, the core scheduler compensates by
allocating a fallback CPU from the same node as the task's previous CPU.
This can sometimes lead to overloading of some CPUs.

Will cpu_online(cpu) check be a better alternative?