Re: [RFC] NUMA futex hashing

From: Eric Dumazet
Date: Tue Aug 08 2006 - 12:06:17 EST

Next message: Dave Hansen: "Re: [PATCH] sys_getppid oopses on debug kernel (v2)"
Previous message: Kari Hurtta: "Re: [RFC/PATCH] revoke/frevoke system calls V2"
In reply to: Ulrich Drepper: "Re: [RFC] NUMA futex hashing"
Next in thread: Nick Piggin: "Re: [RFC] NUMA futex hashing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tuesday 08 August 2006 17:11, Nick Piggin wrote:
> Ulrich Drepper wrote:
> > On 8/8/06, Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:
> >> The validity of the virtual address is still tested by normal get_user()
> >> call.. If the memory was freed by a thread, then a normal EFAULT error
> >> will
> >> be reported... eventually.
> >
> > This is indeed what should be done. Private futexes are the by far
> > more frequent case and I bet you'd see improvements when avoiding the
> > mm mutex even for normal machines since futexes really are everywhere.
> > For shared mutexes you end up doing two lookups and that's fine IMO
> > as long as the first lookup is fast.
>
> The private futex's namespace is its virtual address, so I don't see
> how you can decouple that from the management of virtual addresses.
>
> Let me get this straight: to insert a contended futex into your rbtree,
> you need to hold the mmap sem to ensure that address remains valid,
> then you need to take a lock which protects your rbtree. Then to wake
> up a process and remove the futex, you need to take the rbtree lock. Or
> to unmap any memory you also need to take the rbtree lock and ensure
> there are no futexes there.
>
> So you just add another lock for no reason, or have I got a few screws
> loose myself? I don't see how you can significantly reduce lock
> cacheline bouncing in a futex heavy workload if you're just going to
> add another shared data structure. But if you can, sweet ;)

We certainly can. But if you insist of using mmap sem at all, then we have a
problem.

rbtree would not reduce cacheline bouncing, so :

We could use a hashtable (allocated on demand) of size N, N depending on
NR_CPUS for example. each chain protected by a private spinlock. If N is well
chosen, we might reduce lock cacheline bouncing. (different threads fighting
on different private futexes would have a good chance to get different
cachelines in this hashtable)

As soon a process enters 'private futex' code, the futex code allocates this
hashtable if the process has a NULL hash table (set to NULL at exec() time,
or maybe re-allocated because we want to be sure futex syscall always suceed
(no ENOMEM))

So we really can... but for 'private futexes' which are the vast majority of
futexes needed by typical program (using POSIX pshared thread mutex attribute
PTHREAD_PROCESS_PRIVATE, currently not used by NPTL glibc)

Of course we would need a new syscall, and to change glibc to be able to
actually use this new private_futex syscall.

Probably a lot of work, still, but could help heavy threaded programs not
touching mmap_sem.

We might have a refcounting problem on this 'hashtable' since several threads
share this structure, but only at thread creation/destruction, not in futex
call (ie no cacheline bouncing on the refcount)

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dave Hansen: "Re: [PATCH] sys_getppid oopses on debug kernel (v2)"
Previous message: Kari Hurtta: "Re: [RFC/PATCH] revoke/frevoke system calls V2"
In reply to: Ulrich Drepper: "Re: [RFC] NUMA futex hashing"
Next in thread: Nick Piggin: "Re: [RFC] NUMA futex hashing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]