Re: [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path

From: K Prateek Nayak

Date: Tue Mar 17 2026 - 01:11:33 EST


Hello Samuel,

On 3/17/2026 8:36 AM, Samuel Holland wrote:
>> @@ -1913,7 +1909,7 @@ int futex_hash_allocate_default(void)
>> * 16 <= threads * 4 <= global hash size
>> */
>> buckets = roundup_pow_of_two(4 * threads);
>> - buckets = clamp(buckets, 16, futex_hashmask + 1);
>> + buckets = clamp(buckets, 16, __futex_mask + 1);
>>
>> if (current_buckets >= buckets)
>> return 0;
>> @@ -1983,10 +1979,19 @@ static int __init futex_init(void)
>> hashsize = max(4, hashsize);
>> hashsize = roundup_pow_of_two(hashsize);
>> #endif
>> - futex_hashshift = ilog2(hashsize);
>> + __futex_mask = hashsize - 1;
>> + __futex_shift = ilog2(hashsize);
>
> __futex_mask is always a power of two minus 1, in other words all low bits set.
> Would it be worth using an n-bit zero extension operation instead of an
> arbitrary 32-bit mask? This would use fewer instructions on some architectures:
> for example a single ubfx on arm64 and slli+srli on riscv.

Sure that works for __futex_mask but runtime_const_mask_32() should be
generic enough to handle any mask, no?

Currently, the __futex_hash() with futex_hashmask ends up being:


# ./include/linux/jhash.h:139: __jhash_final(a, b, c);
xor a4,a4,a3 # tmp350, tmp353, tmp334
...
# kernel/futex/core.c:449: return &futex_queues[node][hash & futex_hashmask];
lla a3,.LANCHOR0 # tmp361,
# kernel/futex/core.c:449: return &futex_queues[node][hash & futex_hashmask];
ld a5,0(a3) # __futex_data.hashmask, __futex_data.hashmask
...
# kernel/futex/core.c:449: return &futex_queues[node][hash & futex_hashmask];
and a5,a5,a4 # tmp358, tmp367, __futex_data.hashmask


which isn't too far from what runtime_const_mask_32() implements
where "lla + ld" sequence gets replaced by the "lui + addi"
sequence to load the immediate.

Sure it can be better here since we know the bitmask is of the form
GENMASK(N,0) but IMO runtime_const_mask_32() should generally work
for all masks.

Now, runtime_const_mask_lower_32(val, nbits) may be a better suited
API name for that purpose.

If there is enough interest, I'll go back to the drawing board and
go that route for v2 for arm64 and riscv.

--
Thanks and Regards,
Prateek