Re: [PATCH] futex: replace bare barrier() with a READ_ONCE()

From: Jianyu Zhan
Date: Mon Feb 29 2016 - 19:46:33 EST


On Tue, Mar 1, 2016 at 6:22 AM, Darren Hart <dvhart@xxxxxxxxxxxxx> wrote:
> You've mention it causes problems a few times, but you do not specify what
> problem it causes or how it manifests.
>
> Is this a theoretical bug, or have you experienced a failure case. If so, how
> did this manifest? Do you have a reproducer we could add to the futex testsuite
> in the kernel selftests?


1. For the first problem I memtioned, it is a bug that described in
commit e91467ecd1ef.

The scenario is like:

lock_ptr = q->lock_ptr
if (lock_ptr != 0) spin_lock(lock_ptr)

and the compiler generates code that is equivalent to :

if (q->lock_ptr != 0) spin_lock(q->lock_ptr)

The problem is q->lock_ptr can change between the test of nullness of
q->lock_ptr and the lock on q->lock_ptr
So a barrier() is inserted into the load of q->lock_ptr and the test
of nullness, to avoid the pointer aliasing.

Apparently, a READ_ONCE() fits the goal here.


2. For the second problem I memtioned, yes, it is theoretical, and
it is also due to q->lock_ptr can change between
the test of nullness of q->lock_ptr and the lock on q->lock_ptr.

the code is

retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}

which is effectively the same as :

while (lock_ptr = q->lock_ptr) {
spin_lock(lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}

This might cause the compiler load the q->lock_ptr once and use it
many times, quoted from
memory-barriers.txt:


(*) The compiler is within its rights to reload a variable, for example,
in cases where high register pressure prevents the compiler from
keeping all data of interest in registers. The compiler might
therefore optimize the variable 'tmp' out of our previous example:

while (tmp = a)
do_something_with(tmp);

This could result in the following code, which is perfectly safe in
single-threaded code, but can be fatal in concurrent code:

while (a)
do_something_with(a);

For example, the optimized version of this code could result in
passing a zero to do_something_with() in the case where the variable
a was modified by some other CPU between the "while" statement and
the call to do_something_with().

Again, use READ_ONCE() to prevent the compiler from doing this:

while (tmp = READ_ONCE(a))
do_something_with(tmp);




So, due to these two reason, I propose this patch.


Thanks,
Jianyu Zhan