Re: [rfc] [patch 1/2 ] Process private hash tables for privatefutexes
From: Andrew Morton
Date: Sat Mar 21 2009 - 07:42:42 EST
On Fri, 20 Mar 2009 21:46:37 -0700 Ravikiran G Thirumalai <kiran@xxxxxxxxxxxx> wrote:
> Patch to have a process private hash table for 'PRIVATE' futexes.
>
> On large core count systems running multiple threaded processes causes
> false sharing on the global futex hash table. The global futex hash
> table is an array of struct futex_hash_bucket which is defined as:
>
> struct futex_hash_bucket {
> spinlock_t lock;
> struct plist_head chain;
> };
>
> static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
>
> Needless to say this will cause multiple spinlocks to reside on the
> same cacheline which is very bad when multiple un-related process
> hash onto adjacent hash buckets. The probability of unrelated futexes
> ending on adjacent hash buckets increase with the number of cores in the
> system (more cores available translates to more processes/more threads
> being run on a system). The effects of false sharing are tangible on
> machines with more than 32 cores. We have noticed this with workload
> of a certain multiple threaded FEA (Finite Element Analysis) solvers.
> We reported this problem couple of years ago which eventually resulted in
> a new api for private futexes to avoid mmap_sem. The false sharing on
> the global futex hash was put off pending glibc changes to accomodate
> the futex private apis. Now that the glibc changes are in, and
> multicore is more prevalent, maybe it is time to fix this problem.
>
> The root cause of the problem is a global futex hash table even for process
> private futexes. Process private futexes can be hashed on process private
> hash tables, avoiding the global hash and a longer hash table walk when
> there are a lot more futexes in the workload. However, this results in an
> addition of one extra pointer to the mm_struct. Hence, this implementation
> of a process private hash table is based off a config option, which can be
> turned off for smaller core count systems. Furthermore, a subsequent patch
> will introduce a sysctl to dynamically turn on private futex hash tables.
>
> We found this patch to improve the runtime of a certain FEA solver by about
> 15% on a 32 core vSMP system.
>
> Signed-off-by: Ravikiran Thirumalai <kiran@xxxxxxxxxxxx>
> Signed-off-by: Shai Fultheim <shai@xxxxxxxxxxxx>
>
> Index: linux-2.6.28.6/include/linux/mm_types.h
> ===================================================================
> --- linux-2.6.28.6.orig/include/linux/mm_types.h 2009-03-11 16:52:06.000000000 -0800
> +++ linux-2.6.28.6/include/linux/mm_types.h 2009-03-11 16:52:23.000000000 -0800
> @@ -256,6 +256,10 @@ struct mm_struct {
> #ifdef CONFIG_MMU_NOTIFIER
> struct mmu_notifier_mm *mmu_notifier_mm;
> #endif
> +#ifdef CONFIG_PROCESS_PRIVATE_FUTEX
> + /* Process private futex hash table */
> + struct futex_hash_bucket *htb;
> +#endif
So we're effectively improving the hashing operation by splitting the
single hash table into multiple ones.
But was that the best way of speeding up the hashing operation? I'd have
thought that for some workloads, there will still be tremendous amounts of
contention for the per-mm hashtable? In which case it is but a partial fix
for certain workloads.
Whereas a more general hashing optimisation (if we can come up with it)
would benefit both types of workload?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/