Re: [PATCH 1/2] sched/wait: Break up long wake list walk

From: Linus Torvalds
Date: Fri Aug 18 2017 - 16:30:01 EST


On Fri, Aug 18, 2017 at 1:05 PM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>
> I think what's happening is that it allows more parallelism during wakeup:
>
> Normally it's like
>
> CPU 1 CPU 2 CPU 3 .....
>
> LOCK
> wake up tasks on other CPUs woken up woken up
> UNLOCK SPIN on waitq lock SPIN on waitq lock

Hmm. The processes that are woken up shouldn't need to touch the waitq
lock after wakeup. The default "autoremove_wake_function()" does the
wait list removal, so if you just use the normal wait/wakeup, you're
all done an don't need to do anythig more.

That's very much by design.

In fact, it's why "finish_wait()" uses that "list_empty_careful()"
thing on the entry - exactly so that it only needs to take the wait
queue lock if it is still on the wait list (ie it was woken up by
something else).

Now, it *is* racy, in the sense that the autoremove_wake_function()
will remove the entry *after* having successfully woken up the
process, so with bad luck and a quick wakeup, the woken process may
not see the good list_empty_careful() case.

So we really *should* do the remove earlier inside the pi_lock region
in ttwu(). We don't have that kind of interface, though. If you
actually do see tasks getting stuck on the waitqueue lock after being
woken up, it might be worth looking at, though.

The other possibility is that you were looking at cases that didn't
use "autoremove_wake_function()" at all, of course. Maybe they are
worth fixing. The autoremval really does make a difference, exactly
because of the issue you point to.

Linus