Re: [patch V2 2/7] futex: Hash private futexes per process

From: Peter Zijlstra
Date: Mon May 30 2016 - 08:06:44 EST

On Mon, May 30, 2016 at 01:08:53PM +0200, Sebastian Andrzej Siewior wrote:
> > So I think there's a number of cases:
> >
> > - PREALLOC_HASH finds a taken bucket; in this case we can simply return
> > an error.
> > - PREALLOC_HASH succeeds, but an on demand hash later hits the same
> > bucket. This is harder; we could maybe mark all buckets taken by
> > PREALLOC_HASH and allow for a signal when this collision hits. Dunno.
> PREALLOC_HASH happens once before any (contended) lock operation.

I know ;-)

> We
> never rehash the hash. To rehash the hash on runtime we would need an
> empty futex hash and some locking in the fast path.

Nah; you can do lockless rehashing. Little tricky, but entirely doable.
At worst we'd need an rcu_read_lock(), but with a little extra work we
can extend an existing preempt_disable section to cover the hash lookup
if it isn't already covered.

> And rehash seems
> not to be required since we tried to come up with a sane default value
> and the user/RT task can set it to the current max value.

I'm not sure this statement is true; given that the hash function
reduces the entire address space down to a few bits you're guaranteed to
have collisions at some point. A good hash will not have more than given
by the birthday paradox, but you cannot have less.

So while the sizing can certainly reduce the chance on collisions,
nothing guarantees you will not have them, quite the opposite.

> So back to when does the collision happen. Since glibc visits the
> kernel only on contention we might learn about the collision when it is
> too late. We could have a lock operation by thread1 followed by lock
> operation by thread2 on different uaddr resulting in the same bucket.
> In this case we learn about this once the spin_lock() operation blocks.

Yep, so if this is two ondemand futexes hitting the same bucket, we
don't care. But if this is an ondemand futex hitting a prealloc one, we
do care.

And yes, this is late, but its the only possible time. And notification
is better than silent weird behaviour.

> Also marking a bucket as taken (on contention) might give false
> positive results since we know nothing about lock's lifetime (i.e. the
> lock might have been free()ed).
> But if I may bring some ideas from v1. In v1 we had "tickets / IDs" for
> the futex per thread. In v2 we don't have them anymore. We still have
> the "private" futex hash buckets but per process this time.
> We could introduce the "tickets / IDs" back and make them process wide.
> We could hide them in pthread_mutex_init() and pthread_mutex_destroy()
> since their IDs are no longer thread unique. I think I had something in
> glibc's pthread variable where we could store 16bit if I split another
> 32bit variable.

Not sure you need a ticket (and I never got around to reading v1), a
simple twin to PREALLOCATE_HASH to call on mutex_destroy would be
sufficient I think. It could free the hash bucket.

> That would be guaranteed collision free and hidden in glibc.

I'm still not seeing how you can guarantee anything with it; ondemand
local hashing can still collide, right?

Or are you thinking two local hashtables, one for ondemand things and
one for prealloc/free type thingies?

> My point was during development / testing to figure out which initial
> value is sane / reasonable. Start your APP with 32 slot. Collisions.
> Again with 128 slots. Oh better.

Ah, right, but given the above and ASLR, you cannot exhaustively test