Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

From: Vincent Donnefort
Date: Fri Nov 26 2021 - 12:20:28 EST


On Fri, Nov 26, 2021 at 04:49:12PM +0000, Valentin Schneider wrote:
> On 26/11/21 15:40, Vincent Guittot wrote:
> > On Fri, 26 Nov 2021 at 14:32, Valentin Schneider
> > <Valentin.Schneider@xxxxxxx> wrote:
> >> /*
> >> - * Allow a per-cpu kthread to stack with the wakee if the
> >> - * kworker thread and the tasks previous CPUs are the same.
> >> - * The assumption is that the wakee queued work for the
> >> - * per-cpu kthread that is now complete and the wakeup is
> >> - * essentially a sync wakeup. An obvious example of this
> >> + * Allow a per-cpu kthread to stack with the wakee if the kworker thread
> >> + * and the tasks previous CPUs are the same. The assumption is that the
> >> + * wakee queued work for the per-cpu kthread that is now complete and
> >> + * the wakeup is essentially a sync wakeup. An obvious example of this
> >> * pattern is IO completions.
> >> + *
> >> + * Ensure the wakeup is issued by the kthread itself, and don't match
> >> + * against the idle task because that could override the
> >> + * available_idle_cpu(target) check done higher up.
> >> */
> >> - if (is_per_cpu_kthread(current) &&
> >> + if (is_per_cpu_kthread(current) && !is_idle_task(current) &&
> >
> > still i don't see the need of !is_idle_task(current)
> >
>
> Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> which can lead to coscheduling when the wakeup is issued by the idle task
> (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> having sent an IPI due to polling). Essentially this overrides the first
> check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> target).
>
> I couldn't prove such wakeups can happen right now, but if/when they do
> (AIUI it would just take someone to add a wake_up_process() down some
> smp_call_function() callback) then we'll need the above. If you're still
> not convinced by now, I won't push it further.

>From a quick experiment, even with the asym_fits_capacity(), I can trigger
the following:

[ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.171943] select_idle_sibling: wakee=rcu_sched:10 nr_cpus_allowed=8 current=swapper/0:1 in_task=1

So the in_task() condition doesn't appear to be enough to filter wakeups
while we have the swapper as a current.

>
> >
> >> + in_task() &&
> >> prev == smp_processor_id() &&
> >> this_rq()->nr_running <= 1) {
> >> return prev;
> >>