Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

From: Valentin Schneider
Date: Fri Nov 26 2021 - 08:34:29 EST


On 26/11/21 09:23, Vincent Guittot wrote:
> On Thu, 25 Nov 2021 at 16:30, Valentin Schneider
> <Valentin.Schneider@xxxxxxx> wrote:
>> On 25/11/21 14:23, Vincent Guittot wrote:
>> > If we want to filter wakeup
>> > generated by interrupt context while a per cpu kthread is running, it
>> > would be better to fix all cases and test the running context like
>> > this
>> >
>>
>> I think that could make sense - though can the idle task issue wakeups in
>> process context? If so that won't be sufficient. A quick audit tells me:
>>
>> o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle
>> o I didn't see any wakeup issued from the cpu_pm_notifier call chain
>> o I'm not entirely sure about flush_smp_call_function_from_idle(). I found
>> this thing in RCU:
>>
>> smp_call_function_single(cpu, rcu_exp_handler)
>>
>> rcu_exp_handler()
>> rcu_report_exp_rdp()
>> rcu_report_exp_cpu_mult()
>> __rcu_report_exp_rnp()
>> swake_up_one()
>>
>> IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be
>> handled in that flush_foo_from_idle() call.
>
> Aren't all these planned to wakeup on local cpu ? so i don't see any
> real problem there
>

Hm so other than boot time oddities I think that does end up with threads
of an !UNBOUND (so pcpu) workqueue...

>>
>> I'd be tempted to stick your VincentD's conditions together, just to be
>> safe...
>
> More than safe I would prefer that we fix the correct root cause
> instead of hiding it
>

I did play around a bit to see if this could be true when evaluating that
is_per_cpu_kthread() condition:

is_idle_task(current) && in_task() && p->nr_cpus_allowed > 1

but no luck so far. An in_task() check would appear sufficient, but how's
this?

---

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 884f29d07963..f45806b7f47a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6390,14 +6390,18 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
return prev;

/*
- * Allow a per-cpu kthread to stack with the wakee if the
- * kworker thread and the tasks previous CPUs are the same.
- * The assumption is that the wakee queued work for the
- * per-cpu kthread that is now complete and the wakeup is
- * essentially a sync wakeup. An obvious example of this
+ * Allow a per-cpu kthread to stack with the wakee if the kworker thread
+ * and the tasks previous CPUs are the same. The assumption is that the
+ * wakee queued work for the per-cpu kthread that is now complete and
+ * the wakeup is essentially a sync wakeup. An obvious example of this
* pattern is IO completions.
+ *
+ * Ensure the wakeup is issued by the kthread itself, and don't match
+ * against the idle task because that could override the
+ * available_idle_cpu(target) check done higher up.
*/
- if (is_per_cpu_kthread(current) &&
+ if (is_per_cpu_kthread(current) && !is_idle_task(current) &&
+ in_task() &&
prev == smp_processor_id() &&
this_rq()->nr_running <= 1) {
return prev;