Thanks!
So I'm thinking we could first make that into
if ((wake_flags & WF_ON_CPU) && !cpu_rq(cpu)->nr_running)
Then building on this, we can generalize using the wakelist to any remote
idle CPU (which on paper isn't as much as a clear win as just WF_ON_CPU,
depending on how deeply idle the CPU is...)
We need the cpu != this_cpu check, as that's currently served by the
WF_ON_CPU check (AFAIU we can only observe p->on_cpu in there for remote
tasks).
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 66c4e5922fe1..60038743f2f1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3830,13 +3830,20 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
if (!cpus_share_cache(smp_processor_id(), cpu))
return true;
+ if (cpu == smp_processor_id())
+ return false;
+
/*
* If the task is descheduling and the only running task on the
* CPU then use the wakelist to offload the task activation to
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
+ *
+ * Note that we can only get here with (wakee) p->on_rq=0,
+ * p->on_cpu can be whatever, we've done the dequeue, so
+ * the wakee has been accounted out of ->nr_running
*/
- if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
+ if (!cpu_rq(cpu)->nr_running)
return true;
return false;