Re: [PATCH v2 05/17] sched/core: allow only preferred CPUs in is_cpu_allowed
From: Shrikanth Hegde
Date: Wed Apr 08 2026 - 08:58:50 EST
Hi Yury.
On 4/8/26 6:35 AM, Yury Norov wrote:
On Wed, Apr 08, 2026 at 12:49:38AM +0530, Shrikanth Hegde wrote:
When possible, choose a preferred CPUs to pick.
Push task mechanism uses stopper thread which going to call
select_fallback_rq and use this mechanism to pick only a preferred CPU.
When task is affined only to non-preferred CPUs it should continue to
run there. Detect that by checking if cpus_ptr and cpu_preferred_mask
interesect or not.
Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
---
kernel/sched/core.c | 17 ++++++++++++++---
kernel/sched/sched.h | 12 ++++++++++++
2 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7ea05a7a717b..336e7c694eb7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2463,9 +2463,16 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
if (is_migration_disabled(p))
return cpu_online(cpu);
- /* Non kernel threads are not allowed during either online or offline. */
- if (!(p->flags & PF_KTHREAD))
- return cpu_active(cpu);
+ /*
+ * Non kernel threads are not allowed during either online or offline.
+ * Ensure it is a preferred CPU to avoid further contention
+ */
+ if (!(p->flags & PF_KTHREAD)) {
+ if (!cpu_active(cpu))
+ return false;
+ if (!cpu_preferred(cpu) && task_can_run_on_preferred_cpu(p))
+ return false;
+ }
/* KTHREAD_IS_PER_CPU is always allowed. */
if (kthread_is_per_cpu(p))
@@ -2475,6 +2482,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
if (cpu_dying(cpu))
return false;
+ /* Try on preferred CPU first */
+ if (!cpu_preferred(cpu) && task_can_run_on_preferred_cpu(p))
+ return false;
First one was regular tasks, this is for unbound kernel threads.
Both will need. No?
You repeat this for the 2nd time. The cpu_preferred() call should go
inside task_can_run_on_preferred_cpu().
I want to keep this check for cpu_preferred() first. the reason being
it is inexpensive since it is bit check. Only if it fails, then one should
bother about task_can_run_on_preferred_cpu which is O(N) as you said.
I am using the task_can_run_on_preferred_cpu in push task mechanism too.
PATCH 10/17. I get there only on non-preferred CPU.
So can I keep as is?
And can you please pick some shorter name?
task_has_preferred_cpus?
+
/* But are allowed during online. */
return cpu_online(cpu);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 88e0c93b9e21..7271af2ca64f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -4130,4 +4130,16 @@ DEFINE_CLASS_IS_UNCONDITIONAL(sched_change)
#include "ext.h"
+#ifdef CONFIG_PARAVIRT
+static inline bool task_can_run_on_preferred_cpu(struct task_struct *p)
+{
+ return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask);
This makes is_cpu_allowed() O(N). Even if CONFIG_PARAVIRT is enabled,
I think some people would prefer to avoid this. Also, select_fallback_rq()
calls it in a loop, and this makes it O(N^2).
/* Any allowed, online CPU? */
for_each_cpu(dest_cpu, p->cpus_ptr) {
if (!is_cpu_allowed(p, dest_cpu))
continue;
goto out;
}
You can keep it O(N):
for_each_cpu_and(dest_cpu, p->cpus_ptr, cpu_preferred_mask) {
...
}
This would leave tasks which has affinity only on non-preferred CPUs without a CPU.
That breaks below case,
600 CPUs, high steal time and hence preferred is 0-399.
In that state, user does "taskset -c 500 <stress-ng>", that task ends up
going to preferred CPUs since its affinity gets reset in the switch
block later in select_fallback_rq.
Not sure how critical that path is, but this looks suspicious.
Fair point. But we come here only if cpu_preferred(cpu) == false.
When that happens, we expect the task running on will get preempted
to a preferred CPUs. If it migrated to a preferred cpu, then on-wards
it wont suffer the additional overhead of task_can_run_on_preferred_cpu
case where it would be O(N^2) is for tasks which have explicit affinity
on non-preferred CPUs. I see that as rare case.
For the majority of the cases, it should still be O(N) since cpu_preferred(cpu)
will be true.
Here is the benchmark data on system where there is 0 steal time.
It is dedicated LPAR(VM).
| Test | Baseline | NO STEAL | %diff to base| STEAL | %diff to base |
| | | MONITOR | | MONITOR | |
|---------------------------------|----------|----------|--------------|-----------|---------------|
| HackBench Process 20 groups | 2.70 | 2.67 | +1.11% | 2.66 | +1.48% |
| HackBench Process 40 groups | 5.31 | 5.26 | +0.94% | 5.30 | +0.19% |
| HackBench Process 60 groups | 7.95 | 7.82 | +1.64% | 7.90 | +0.63% |
| HackBench thread 10 Time | 1.61 | 1.59 | +1.24% | 1.56 | +3.11% |
| HackBench thread 20 Time | 2.82 | 2.83 | -0.35% | 2.80 | +0.71% |
| HackBench Process(Pipe) 20 Time | 1.51 | 1.50 | +0.66% | 1.47 | +2.65% |
| HackBench Process(Pipe) 40 Time | 2.78 | 2.69 | +3.24% | 2.69 | +3.24% |
| HackBench Process(Pipe) 60 Time | 3.73 | 3.76 | -0.80% | 3.64 | +2.41% |
| HackBench thread(Pipe) 10 Time | 0.94 | 0.91 | +3.19% | 0.91 | +3.19% |
| HackBench thread(Pipe) 20 Time | 1.58 | 1.58 | -0.00% | 1.59 | -0.63% |
+}
+#else
+static inline bool task_can_run_on_preferred_cpu(struct task_struct *p)
+{
+ return true;
+}
+#endif
Same comment as in patch 3. I believe, it's worth to declare cpu_preferred_mask
unrelated to CONFIG_PARAVIRT, so that you'll not have to spread this
ifdefery around.
Yes.
+
#endif /* _KERNEL_SCHED_SCHED_H */
--
2.47.3