The current futex_wait() code (I'm looking at tip/core/futexes)
conflicts with a warning in the comments about checking *uaddr==val
before the futex_q is queued on the hb list. While userspace is able to
alter *uaddr at will and should expect to hang in the kernel forever
should it do so haphazardly, there are legitimate scenarios where the
futex value might change between the call to futex_wait() and when the
futex_q gets on the hb list.
For example, glibc protects access to the value of cond.__data.__futex
via the cond.__data.__lock. However, before it can issue the syscall it
has to drop the cond.__data.__lock, leaving a small race window where
userspace might issue a signal or broadcast, which will modify the value
of cond.__data.__futex. As I understand it, this will result in the
waiter having changed the value of the futex prior to entering the
kernel, but not enqueuing itself on the hb list until after the waiter
issues the broadcast that was intended to wake it up.
I was working up a patch to move the test to after the call to
queue_me(), but in order to do the test we also have to perform the
get_user() after the queue_me(), which might sleep if we still hold the
hb->lock. If we let queue_me() drop the hb->lock before we call
get_user() then we may see a legitimate change in *uaddr that occured
after the queue_me() and before the get_user().
I'm at a loss for how to resolve the race without causing the false
positive inside the kernel. It might be resolvable in glibc by looking
at the return code from futex_requeue and checking if the number woken_or_requeued agrees with the number it expected to be sleeping; this likely leaves other gaps for other waking calls, like FUTEX_WAKE.
Any thoughts? Am I missing something that guards against this race?