Re: [patch V2 2/7] futex: Hash private futexes per process

From: Darren Hart
Date: Fri May 06 2016 - 17:56:44 EST

On Fri, May 06, 2016 at 11:09:33AM -0700, Darren Hart wrote:
> On Thu, May 05, 2016 at 08:44:04PM -0000, Thomas Gleixner wrote:
> > From: Sebastian Siewior <bigeasy@xxxxxxxxxxxxx>
> >
> > The standard futex mechanism in the Linux kernel uses a global hash to store
> > transient state. Collisions on that hash can lead to performance degradation
> > especially on NUMA systems and on real-time enabled kernels even to priority
> > inversions.
> I think it is worth noting the how this causes an unbounded priority inversion
> as it wasn't obvious to me. At least mention that "CPU pinning" can result in an
> unbounded priority inversion.
> >
> > To mitigate that problem we provide per process private hashing. On the first
> > futex operation in a process the kernel allocates a hash table. The hash table
> > is accessible via the process mm_struct. On Numa systems the hash is allocated
> > node local.
> >
> > If the allocation fails then the global hash table is used as fallback, so
> > there is no user space visible impact of this feature.
> >
> It would be good to have a way to detect that the process private hash table was
> successfully created. Perhaps a /proc/pid/ feature? This would allow us to write
> a functional futex test for tools/testing/selftests/futex

I suppose we could just use FUTEX_PREALLOC_HASH for this purpose, passing in the
default hash size. This will either return the default, the previously set
value, or 0, indicating the global hash is being used. That should be sufficient
for programatically determining the state of the system.

The /proc/pid/futex_hash_size option may still be convenient for other purposes.
Perhaps with a -1 indicating it hasn't been set yet.

Darren Hart
Intel Open Source Technology Center