Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

From: Vincent Donnefort
Date: Mon Nov 29 2021 - 11:56:38 EST


[...]

> > > >
> > > > still i don't see the need of !is_idle_task(current)
> > > >
> > >
> > > Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> > > which can lead to coscheduling when the wakeup is issued by the idle task
> > > (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> > > having sent an IPI due to polling). Essentially this overrides the first
> > > check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> > > target).
> > >
> > > I couldn't prove such wakeups can happen right now, but if/when they do
> > > (AIUI it would just take someone to add a wake_up_process() down some
> > > smp_call_function() callback) then we'll need the above. If you're still
> > > not convinced by now, I won't push it further.
> >
> > From a quick experiment, even with the asym_fits_capacity(), I can trigger
> > the following:
> >
> > [ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>
> Timestamp shows its booting phase and thread name above shows per cpu
> thread. Could it happen just while creating per cpu thread at boot and
> as a result not relevant ?

I have more of those logs a bit later in the boot:

[ 0.484791] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.516495] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.525758] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.535078] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.547486] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.579192] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1

The nr_cpus_allowed=8 suggest that none of the threads from the logs I
shared are per-CPU. Sorry if the format is confusing, I used:

wakee=<comm>:<pid> current=<comm>:<pid>.

>
> Can you see similar things later after booting ?

I tried few scenarios other than the boot time but none of them produced
"current=swapper/X:1 in_task=1"

>
> I have tried to trigger the situation but failed to get wrong
> sequence. All are coming from interrupt while idle.
> After adding in_task() condition, I haven't been able to trigger the
> warn() that I added to catch the wrong situations on SMP, Heterogenous
> or NUMA system. Could you share more details on your setup ?
>

This is just my Hikey960 with the asym_fits_capacity() fix [1] to make sure I
don't simply hit the other issue with asym platforms.

Then I just added my log in the per-CPU kthread wakee stacking exit path

printk("%s: wakee=%s:%d nr_cpus_allowed=%d current=%s:%d in_task=%d\n",
__func__, p->comm, p->pid, p->nr_cpus_allowed, current->comm, current->pid, in_task());


[1] https://lore.kernel.org/all/20211125101239.3248857-1-vincent.donnefort@xxxxxxx/


>From the same logs I also see:

wakee=xfsaild/mmcblk0:4855 nr_cpus_allowed=8 current=kworker/1:1:1070 in_task=0

Doesn't that look like a genuine wakeup that would escape the per-CPU kthread
stacking exit path because of the in_task test?