Re: wait_on_page_bit_common(TASK_KILLABLE, EXCLUSIVE) can miss wakeup?

From: Oleg Nesterov
Date: Wed Jun 24 2020 - 12:43:30 EST


On 06/24, Linus Torvalds wrote:
>
> That said, I'm not entirely happy with your patch.

Neither me,

> The real problem, I feel, is that
>
> if (likely(bit_is_set))
> io_schedule();
>
> anti-pattern. Without that, we wouldn't have the bug.
>
> Normally, we'd be TASK_RUNNING in this sequence, but because we might
> skip io_schedule(), we can still be in a "sleeping" state here and be
> "woken up" between that bit setting and the signal check.

Ah.

And now it _seems_ to me that even if io_schedule() is called
try_to_wake_up() can "falsely" succed if signal_pending_state() is true,
even if __schedule() won't block in this case.

But I am sure I missed something else. I spent to much time reading the
random code paths today, I'll return tomorrow.

Oleg.