Re: wait_on_page_bit_common(TASK_KILLABLE, EXCLUSIVE) can miss wakeup?
From: Linus Torvalds
Date: Mon Jun 29 2020 - 15:07:43 EST
On Mon, Jun 29, 2020 at 6:16 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>
> No, ignore this part (which you explained well it was just a thinko,
> and your patch of course would not have worked if this was the case):
> the exclusive wake up doesn't get lost if schedule() was called because
> state goes back to running regardless of what woke it.
Right.
The normal pattern for a wait-loop is
- add yourself to the wait-queue and set yourself "sleeping" with a
memory barrier.
- test for the condition, exit if ok
- go to sleep
and that pattern doesn't have the race.
The other common pattern is to check for the "do I need to sleep at
all" at the *top* of the function, long before you bother with any
wait-queues at all. This code does that odd "let's check in the middle
if we need to sleep" instead, which then caused the bug.
So we had an odd setup because of three different wait conditions that
had different rules for what they could/should do before sleeping, and
then not sleeping reliably at all.
We could also fix it by just splitting out the three cases into their
own wait routines that match the normal pattern. The bug really
happened because that wait-loop is doing things such an odd way due to
all the different cases..
I actually think it would be a lot more readable if it was three
different cases instead of trying to be one "common" routine.
The *common* parts is the special PG_locked logic at the top, and the
thrashing/delayacct code at the bottom.
And *that* could be a true common helper, but the wait loop (which
isn't even a loop for the DROP case) are fundamentally different and
probably should be separate functions.
So I think my "one-liner" fixes the problem, but I'd certainly be open
to somebody cleaning this up properly.
Anybody?
Linus