Re: [PATCH v2] wait: add comment before waitqueue_active noting memory barrier is required

From: Peter Zijlstra
Date: Fri Oct 23 2015 - 08:40:20 EST


On Thu, Oct 22, 2015 at 11:18:33PM +0000, Kosuke Tatsukawa wrote:
> Peter Zijlstra wrote:
> > On Thu, Oct 22, 2015 at 08:01:37AM +0000, Kosuke Tatsukawa wrote:
> >
> > Its somewhat unfortunate you chose the whole wait_woken() thing, its
> > 'rare'.
>
> Yes. I first noticed this lack of memory barrier before
> waitqueue_active() issue in drivers/tty/n_tty.c which was using
> wait_woken(). However, other places were mostly using prepare_to_wait()
> or wait_event*(), so wait_woken() is 'rare'.

Which I no doubt introduced there (the wait_woken thing), and it would
have been nice if I'd been Cc to that discussion.

In any case, I found the patch in next and dropping the
waitqueue_active() think is in deed the sane solution. It will serialize
everything on the queue lock.

> >> Second, on the waiting thread side, the CPU can reorder the load of
> >> CONDITION to occur during add_wait_queue active, before the entry is
> >> added to the wait queue.
> >> wake_up thread waiting thread
> >> (reordered)
> >> ------------------------------------------------------------------------
> >> spin_lock_irqsave(...) <add_wait_queue>
> >> if (CONDITION)
> >> CONDITION = 1;
> >> if (waitqueue_active(wq))
> > wake_up();
> >> __add_wait_queue(...) <add_wait_queue>
> >> spin_unlock_irqrestore(...) <add_wait_queue>
> >> wait_woken(&wait, ...);
> >> ------------------------------------------------------------------------
> >
> > This isn't actually a problem IIRC, because wait_woken() will test
> > WQ_FLAG_WOKEN and not actually sleep.
>
> In the above figure, waitqueue_active(wq) will return 0 (queue is
> inactive) and skip the whole wake_up() call, because __add_wait_queue()
> hasn't been called yet. This actually does occur using a reproducer.

Duh, indeed.

> > Does that work for you?
>
> Yes. Considering that the use of wait_woken is pretty rare, I think the
> explanation is more focused and easier to understand this way.

OK, thanks, I'll queue the below.

---
Subject: sched, wait: Document waitqueue_active
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Fri Oct 23 14:32:34 CEST 2015

Kosuku reports that there were a fair number of buggy
waitqueue_active() users and this function deserves a big comment in
order to avoid growing more.

Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Reported-by: Kosuke Tatsukawa <tatsu@xxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
include/linux/wait.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -102,6 +102,36 @@ init_waitqueue_func_entry(wait_queue_t *
q->func = func;
}

+/**
+ * waitqueue_active -- locklessly test for waiters on the queue
+ * @q: the waitqueue to test for waiters
+ *
+ * returns true if the wait list is not empty
+ *
+ * NOTE: this function is lockless and requires care, incorrect usage _will_
+ * lead to sporadic and non-obvious failure.
+ *
+ * Use either while holding wait_queue_head_t::lock or when used for wakeups
+ * with an extra smp_mb() like:
+ *
+ * CPU0 - waker CPU1 - waiter
+ *
+ * for (;;) {
+ * @cond = true; prepare_to_wait(&wq, &wait, state);
+ * smp_mb(); // smp_mb() from set_current_state()
+ * if (waitqueue_active(wq)) if (@cond)
+ * wake_up(wq); break;
+ * schedule();
+ * }
+ * finish_wait(&wq, &wait);
+ *
+ * Because without the explicit smp_mb() it's possible for the
+ * waitqueue_active() load to get hoisted over the @cond store such that we'll
+ * observe an empty wait list while the waiter might not observe @cond.
+ *
+ * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
+ * which (when the lock is uncontended) are of roughly equal cost.
+ */
static inline int waitqueue_active(wait_queue_head_t *q)
{
return !list_empty(&q->task_list);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/