Re: [PATCH v2] wait: add comment before waitqueue_active notingmemory barrier is required

From: Kosuke Tatsukawa
Date: Thu Oct 22 2015 - 19:20:40 EST


Peter Zijlstra wrote:
> On Thu, Oct 22, 2015 at 08:01:37AM +0000, Kosuke Tatsukawa wrote:
>
> Its somewhat unfortunate you chose the whole wait_woken() thing, its
> 'rare'.

Yes. I first noticed this lack of memory barrier before
waitqueue_active() issue in drivers/tty/n_tty.c which was using
wait_woken(). However, other places were mostly using prepare_to_wait()
or wait_event*(), so wait_woken() is 'rare'.


>> Second, on the waiting thread side, the CPU can reorder the load of
>> CONDITION to occur during add_wait_queue active, before the entry is
>> added to the wait queue.
>> wake_up thread waiting thread
>> (reordered)
>> ------------------------------------------------------------------------
>> spin_lock_irqsave(...) <add_wait_queue>
>> if (CONDITION)
>> CONDITION = 1;
>> if (waitqueue_active(wq))
> wake_up();
>> __add_wait_queue(...) <add_wait_queue>
>> spin_unlock_irqrestore(...) <add_wait_queue>
>> wait_woken(&wait, ...);
>> ------------------------------------------------------------------------
>
> This isn't actually a problem IIRC, because wait_woken() will test
> WQ_FLAG_WOKEN and not actually sleep.

In the above figure, waitqueue_active(wq) will return 0 (queue is
inactive) and skip the whole wake_up() call, because __add_wait_queue()
hasn't been called yet. This actually does occur using a reproducer.


>> However, if that is too expensive, the reordering could be prevented by
>> adding memory barriers in the following places.
>> wake_up thread waiting thread
>> ------------------------------------------------------------------------
>> CONDITION = 1; add_wait_queue(wq, &wait);
>> smp_mb(); smp_mb();
>> if (waitqueue_active(wq)) for (;;) {
>> wake_up(wq); if (CONDITION)
>> break;
>> wait_woken(&wait, ...);
>> }
>
> So for wait_woken, WQ_FLAG_WOKEN should 'fix' that, and for pretty much
> anything else you must have a set_current_state() before testing
> CONDITION and you're good (as you state elsewhere).

wait_woken() calls set_current_state(), but that is after the CONDITION
test.


>> +++ b/include/linux/wait.h
>> @@ -102,6 +102,19 @@ init_waitqueue_func_entry(wait_queue_t *q, wait_queue_func_t func)
>> q->func = func;
>> }
>>
>> +/*
>> + * Note: When adding waitqueue_active before calling wake_up for
>> + * optimization, some sort of memory barrier is required on SMP so
>> + * that the waiting thread does not miss the wake up.
>> + *
>> + * A memory barrier is required before waitqueue_active to prevent
>> + * waitqueue_active from being reordered by the CPU before any writes
>> + * done prior to it.
>> + *
>> + * The waiting side also needs a memory barrier which pairs with the
>> + * wake_up side. If prepare_to_wait() or wait_event*() is used, they
>> + * contain the memory barrier in set_current_state().
>> + */
>> static inline int waitqueue_active(wait_queue_head_t *q)
>> {
>> return !list_empty(&q->task_list);
>
> How about something like:
>
> /**
> * waitqueue_active -- locklessly test for waiters on the queue
> * @q: the waitqueue to test for waiters
> *
> * returns true if the wait list is not empty
> *
> * NOTE: this function is lockless and requires care, incorrect usage
> * _will_ lead to sporadic and non-obvious failure.
> *
> * Use either while holding wait_queue_head_t::lock or when used for
> * wakeups with an extra smp_mb() like:
> *
> * CPU0 - waker CPU1 - waiter
> *
> * for (;;) {
> * @cond = true; prepare_to_wait(&wq, &wait, state);
> * smp_mb(); /* smp_mb() from set_current_state() */
> * if (waitqueue_active(wq)) if (@cond)
> * wake_up(wq); break;
> * schedule();
> * }
> *
> * Because without the explicit smp_mb() its possible for the
> * waitqueue_active() load to get hoisted over the @cond store such that
> * we'll observe an empty wait list while the waiter might not observe
> * @cond.
> *
> * Also note that this 'optimization' trades a spin_lock() for an
> * smp_mb(), which (when the lock is uncontended) are of roughly equal
> * cost.
> */
>
> Does that work for you?

Yes. Considering that the use of wait_woken is pretty rare, I think the
explanation is more focused and easier to understand this way.

Best regards.
---
Kosuke TATSUKAWA | 3rd IT Platform Department
| IT Platform Division, NEC Corporation
| tatsu@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/