Re: [PATCH v2] futex: replace bare barrier() with a READ_ONCE()

From: Christian Borntraeger
Date: Wed Mar 02 2016 - 08:36:05 EST


On 03/02/2016 02:08 PM, Jianyu Zhan wrote:
> Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier()
> in unqueue_me(), to address below problem.
>
> The scenario is like this:
>
> ====================
> original code:
>
> retry:
> lock_ptr = q->lock_ptr;
> if (lock_ptr != 0) {
> spin_lock(lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> ====================
> s390x generates code that is equivalent to:
>
> retry:
> if (q->lock_ptr != 0) {
> spin_lock(q->lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> since q->lock_ptr might change between test of non-nullness and spin_lock(),
> the double load may lead to problem. So that commit use a barrier() to prevent this.
>
> This patch replaces this bare barrier() with a READ_ONCE().
>
> The reasons are:
>
> 1) READ_ONCE() is a more weak form of barrier() that affect only the specific
> accesses, while barrier() is a more general compiler level memroy barrier.
>
> 2) READ_ONCE() which could be more informative by its name, while a bare barrier()
> without comment leads to quite a bit of perplexity.
>
> 3) READ_ONCE() _might_ prevent more _theoretical_ "optimizations" by the compiler:
>
> The above retry logic is effectively the same as:
>
> while (lock_ptr = READ_ONCE(q->lock_ptr)) {
> spin_lock(lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> }
>
> ...
> }
>
> If without the READ_ONCE, the compiler _might_ optimize the load of q->lock_ptr out of the
> test satement, which will also lead to the same pointer aliasing problem.
>
> For why do compiler might do this, here is a snippet quoted from Documentation/memory-barriers.txt:
>
> |(*) The compiler is within its rights to reload a variable, for example,
> | in cases where high register pressure prevents the compiler from
> | keeping all data of interest in registers. The compiler might
> | therefore optimize the variable 'tmp' out of our previous example:
> |
> | while (tmp = a)
> | do_something_with(tmp);
> |
> | This could result in the following code, which is perfectly safe in
> | single-threaded code, but can be fatal in concurrent code:
> |
> | while (a)
> | do_something_with(a);
> |
> | For example, the optimized version of this code could result in
> | passing a zero to do_something_with() in the case where the variable
> | a was modified by some other CPU between the "while" statement and
> | the call to do_something_with().
> |
> | Again, use READ_ONCE() to prevent the compiler from doing this:
> |
> | while (tmp = READ_ONCE(a))
> | do_something_with(tmp);


Not sure if you need that huge patch description, but it does not hurt either.


>
>
> Suggested-by: Darren Hart <dvhart@xxxxxxxxxxxxx>
> Signed-off-by: Jianyu Zhan <nasa4836@xxxxxxxxx>
> ---
> v1->v2:
> According to suggestion by Darren Hart, revise the commit log to justify the replacement.
>
> I also cc'ed the s390 maintainers, if they could help provide assembly code after this patch
> to prove the correctness, that would be better.
>
> kernel/futex.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 5d6ce64..6796084 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1927,8 +1927,13 @@ static int unqueue_me(struct futex_q *q)
>
> /* In the common case we don't take the spinlock, which is nice. */
> retry:
> - lock_ptr = q->lock_ptr;
> - barrier();
> + /*
> + * On s390x, it was observed that compiler generates such code that spin_lock() will operate on
> + * another load of q->lock_ptr, instead of on @lock_ptr, and since q->lock_ptr might change between
> + * the test of non-nullness and the spin_lock(), which leads to problem. So use READ_ONCE() here to
> + * prevent this compiler "optimization".
> + */

That change makes sense. I did the original barrier back in 2006 (could you cc me next time?)
ACCESS_ONCE or READ_ONCE was not available at that time and its now the better way.

This is not an s390 specific problem, it was just triggered there as the gcc cost model
considered the memory read as cheap as a register read.

Maybe simplify the comment to something like

/* Prevent the compiler to read lock_ptr twice (if and spin_lock) */


> + lock_ptr = READ_ONCE(q->lock_ptr);
> if (lock_ptr != NULL) {
> spin_lock(lock_ptr);
> /*
>