Re: Futex queue_me/get_user ordering

From: Jamie Lokier
Date: Tue Nov 16 2004 - 10:07:27 EST


Hidetoshi Seto wrote:
> I have to deeply apologize to all for my mistake.
> If my understanding is correct, this bug is "2.4 futex"(RHEL3) *SPECIFIC*!!
> I had swallow the story that 2.6 futex has the same problem...

Wrong, 2.6 has the same behaviour!

> So I realize that 2.6 futex never behave:
> >> "returns 0 if the futex was not equal to the expected value, but
> >> the process was woken by a FUTEX_WAKE call."
>
> Update of manpage is now unnecessary, I think.

It is necessary.

> First of all, I would appreciate if you could read my old post:
> "Kernel bug in futex_wait, cause application hang with NPTL"
> http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2044.html

> If my understanding is correct, 2.6 futex does not get any spinlocks,
> but a semaphore:
>
> 286 static int futex_wake(unsigned long uaddr, int nr_wake)
> :
> 294 down_read(&current->mm->mmap_sem);
>
> 477 static int futex_wait(unsigned long uaddr, int val, unsigned long time)
> :
> 483 down_read(&current->mm->mmap_sem);

> This semaphore prevents a waiter which temporarily queued to check the val
> from being target of wakeup.

No, because it's a read-write semaphore, and we do "down_read" on it
which is a shared lock. It does not prevent concurrent wake and wait
operations!

The only reason we use this semaphore is to block against vma-changing
operations (like mmap) while we look up the futex key and memory word.

> (If it is not possible that there are threads which go around with same
> futex/condvar but each have different mmap_sem,)

Actually it is possible, with process-shared condvars, but it's
irrelevant because down_read doesn't prevent concurrent wakes and
waits.

[About 2.4 futex in RHEL3U2 which takes spinlocks instead]:
> However, this spinlocks fail to prevent topical waiters from wakeups.
> Because the spinlocks are released *before* unqueue_me(&q) (line 343 & 373).
> So this failure allows wake_Y to touch the queue while wait_A is in it.

This order is necessary, because it's not safe to call get_user()
while holding any spinlocks. It is not a bug in RHEL.

> At least 2.4 futex in RHEL3U2 is buggy.

I don't think it is, because I think the behaviour you'll see with
RHEL3U2 is no different than 2.6, just slower ;)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/