Re: Tasks stuck in futex code (in 3.14-rc6)

From: Linus Torvalds
Date: Thu Mar 20 2014 - 13:42:13 EST


On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@xxxxxx> wrote:
>> It strikes me that the "spin_is_locked()" test has no barriers wrt the
>> writing of the new futex value on the wake path. And the read barrier
>> obviously does nothing wrt the write either. Or am I missing
>> something? So the write that actually released the futex might be
>> almost arbitrarily delayed on the waking side. So the waiting side may
>> not see the new value, even though the waker assumes it does due to
>> the ordering of it doing the write first.
>
> Aha, that would certainly violate the ordering guarantees. I feared
> _something_ like that when we originally discussed your suggestion as
> opposed to the atomics one, but didn't have any case for it either.

Actually, looking closer, we have the memory barrier in
get_futex_key_refs() (called by "get_futex_key()") so that's not it.
In fact, your "atomic_read(&hb->waiters)" doesn't have any more
serialization than the spin_is_locked() test had.

But the spin_is_locked() and queue-empty tests are two separate memory
reads, and maybe there is some ordering wrt those two that we missed,
so the "waiters" patch is worth trying anyway.

I do still dislike how the "waiters" thing adds an atomic update, but whatever..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/