Re: [RFC] NUMA futex hashing

From: Nick Piggin
Date: Tue Aug 08 2006 - 12:19:59 EST


Ulrich Drepper wrote:
On 8/8/06, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

Let me get this straight: to insert a contended futex into your rbtree,
you need to hold the mmap sem to ensure that address remains valid,
then you need to take a lock which protects your rbtree.


Why does it have to remain valid? As long as the kernel doesn't crash
on any of the operations associated with the futex syscalls let the
address space region explode, implode, whatever. It's a bug in the
program if the address region is changed while a futex is placed
there. If the futex syscall hangs forever or returns with a bogus
state (error or even success) this is perfectly acceptable. We

I thought mremap (no, that's already kind of messed up); or
even just getting consistency in failures (eg. so you don't have
the situation that a futex op can succeed on a previously
unmapped region).

If you're not worried about the latter, then it might work...

I didn't initially click that the private futex API operates
purely on tokens rather than virtual memory... comments in
futex.c talk about futexes being hashed to a particular
physical page (which is the case for shared). That's whacked.

So actually you would change semantics in some weird corner
cases, like mremaping a shared futex over a private futex's
Arguably that's broken, though ;)

shouldn't slow down correct uses just to make it possible for broken
programs to receive a more detailed error description.


No we shouldn't slow them down. I'd be interested to see whether
locking is significantly sped up with this new data structure,
though.

You might also slow down due to the fact that you'd have to do the
locking and unconditionally traverse the private futexes even for
shared futexes.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/