Re: Futex queue_me/get_user ordering
From: Jamie Lokier
Date:  Tue Nov 16 2004 - 10:07:27 EST
Hidetoshi Seto wrote:
> I have to deeply apologize to all for my mistake.
> If my understanding is correct, this bug is "2.4 futex"(RHEL3) *SPECIFIC*!!
> I had swallow the story that 2.6 futex has the same problem...
Wrong, 2.6 has the same behaviour!
> So I realize that 2.6 futex never behave:
> >>      "returns 0 if the futex was not equal to the expected value, but
> >>       the process was woken by a FUTEX_WAKE call."
> 
> Update of manpage is now unnecessary, I think.
It is necessary.
> First of all, I would appreciate if you could read my old post:
> "Kernel bug in futex_wait, cause application hang with NPTL"
> http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2044.html
> If my understanding is correct, 2.6 futex does not get any spinlocks,
> but a semaphore:
>
>  286 static int futex_wake(unsigned long uaddr, int nr_wake)
>   :
>  294         down_read(¤t->mm->mmap_sem);
>
>  477 static int futex_wait(unsigned long uaddr, int val, unsigned long time)
>   :
>  483         down_read(¤t->mm->mmap_sem);
> This semaphore prevents a waiter which temporarily queued to check the val
> from being target of wakeup.
No, because it's a read-write semaphore, and we do "down_read" on it
which is a shared lock.  It does not prevent concurrent wake and wait
operations!
The only reason we use this semaphore is to block against vma-changing
operations (like mmap) while we look up the futex key and memory word.
> (If it is not possible that there are threads which go around with same
> futex/condvar but each have different mmap_sem,)
Actually it is possible, with process-shared condvars, but it's
irrelevant because down_read doesn't prevent concurrent wakes and
waits.
[About 2.4 futex in RHEL3U2 which takes spinlocks instead]:
> However, this spinlocks fail to prevent topical waiters from wakeups.
> Because the spinlocks are released *before* unqueue_me(&q) (line 343 & 373).
> So this failure allows wake_Y to touch the queue while wait_A is in it.
This order is necessary, because it's not safe to call get_user()
while holding any spinlocks.  It is not a bug in RHEL.
> At least 2.4 futex in RHEL3U2 is buggy.
I don't think it is, because I think the behaviour you'll see with
RHEL3U2 is no different than 2.6, just slower ;)
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/