Re: [PATCH v4 05/11] futex: Track the futex hash bucket.

From: Sebastian Andrzej Siewior
Date: Thu Dec 12 2024 - 12:08:32 EST


On 2024-12-10 19:45:56 [+0100], Thomas Gleixner wrote:
> > +void futex_hash_put(struct futex_hash_bucket *hb)
> > +{
> > + struct futex_hash_bucket_private *hb_p;
> > +
> > + if (hb->hb_slot == 0)
> > + return;
> > + hb_p = container_of(hb, struct futex_hash_bucket_private,
> > + queues[hb->hb_slot - 1]);
>
> Duh. This off by one abuse of hb_slot is really counter intuitive. It
> took me a while to wrap my head around it.

The really cute part is that this -1 gets optimized away and this
container_of becomes just hb_p = hb - "hb_slot << 6" :)

> The structure has a 4 byte hole, so adding a private flag or such is
> feasible without going over a cache line, unless lockdep or rt is
> enabled, but in that case it expands into a second cache line anyway.

I have a whole cacheline to stash things because of futex_hash_bucket
alignment:
| struct futex_hash_bucket_private {
| rcuref_t users; /* 0 4 */
| unsigned int hash_mask; /* 4 4 */
| struct callback_head rcu __attribute__((__aligned__(8))); /* 8 16 */
| bool slots_invariant; /* 24 1 */
|
| /* XXX 39 bytes hole, try to pack */
|
| /* --- cacheline 1 boundary (64 bytes) --- */
| struct futex_hash_bucket queues[] __attribute__((__aligned__(64))); /* 64 0 */
|
| /* size: 64, cachelines: 1, members: 5 */
| /* sum members: 25, holes: 1, sum holes: 39 */
| /* forced alignments: 2, forced holes: 1, sum forced holes: 39 */
| } __attribute__((__aligned__(64)));

The hash bucket itself on RT is:
| struct futex_hash_bucket {
| atomic_t waiters; /* 0 4 */
| unsigned int hb_slot; /* 4 4 */
| spinlock_t lock; /* 8 32 */
| struct plist_head chain; /* 40 16 */
|
| /* size: 64, cachelines: 1, members: 4 */
| /* padding: 8 */
| } __attribute__((__aligned__(64)));

so it still fits. However with lockdep enabled it acquires two
cache lines and these additional 4 bytes don't make a change.
On RT it doesn't make a difference because the spinlock_t is so huge.
But for !RT struct futex_hash_bucket becomes 32 bytes so we could fit 2
buckets into one cache line _if_ memory becomes a concern and cache
bouncing is not an issue because it is only process wide.

> > + futex_hash_priv_put(hb_p);
> > +}

Sebastian