Re: [RFC PATCH] sched/wait: Make interruptible exclusive waitqueue wakeups reliable

From: Ingo Molnar
Date: Tue Dec 10 2019 - 02:29:31 EST



* Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

> > long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state)
> > {
> > unsigned long flags;
> > long ret = 0;
> >
> > spin_lock_irqsave(&wq_head->lock, flags);
> > if (signal_pending_state(state, current)) {
> > /*
> > * Exclusive waiter must not fail if it was selected by wakeup,
> > * it should "consume" the condition we were waiting for.
> > *
> > * The caller will recheck the condition and return success if
> > * we were already woken up, we can not miss the event because
> > * wakeup locks/unlocks the same wq_head->lock.
> > *
> > * But we need to ensure that set-condition + wakeup after that
> > * can't see us, it should wake up another exclusive waiter if
> > * we fail.
> > */
> > list_del_init(&wq_entry->entry);
> > ret = -ERESTARTSYS;
>
> ...
>
> > I think we can indeed lose an exclusive event here, despite the comment
> > that argues that we shouldn't: if we were already removed from the list
>
> If we were already removed from the list and condition is true, we can't
> miss it, ret = -ERESTARTSYS won't be used. This is what this part of the
> comment above
>
> * The caller will recheck the condition and return success if
> * we were already woken up, we can not miss the event because
> * wakeup locks/unlocks the same wq_head->lock.
>
> tries to explain.

Yeah, indeed - it assumes that the condition is stable from wakeup to
wakee running - which as Linus said it must be, because otherwise
exclusive waiters couldn't reliably exit the wait loop.

So there's no bug. How about the clarifying comment below?

Thanks,

Ingo

kernel/sched/wait.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index ba059fbfc53a..6783bac00b5c 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -290,6 +290,11 @@ long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_en
* But we need to ensure that set-condition + wakeup after that
* can't see us, it should wake up another exclusive waiter if
* we fail.
+ *
+ * In other words, if an exclusive waiter got here, then the
+ * waitqueue condition is and stays true and we are guaranteed
+ * to exit the waitqueue loop and will ignore the -ERESTARTSYS
+ * and return success.
*/
list_del_init(&wq_entry->entry);
ret = -ERESTARTSYS;