Re: [RFC PATCH] futex: Dynamically allocate futex_queues depending on nr_node_ids
From: K Prateek Nayak
Date: Sun Mar 01 2026 - 23:59:41 EST
Hello Peter, Sebastian,
On 2/27/2026 9:48 PM, Peter Zijlstra wrote:
> On Fri, Feb 27, 2026 at 08:29:03PM +0530, K Prateek Nayak wrote:
>
>>> Both will result in at least one extra deref/cacheline for each futex
>>> op, no?
>>
>> Ack but I was wondering if that penalty can be offset by the fact that
>> we no longer need to look at "nr_node_ids" in a separate cacheline?
>>
>> I ran futex bench enough time before posting to come to conclusion that
>> there isn't any noticeable regression - the numbers swung either ways
>> and I just took one set for comparison.
>>
>> Sebastian and I have been having a more philosophical discussion on that
>> CONFIG_NODES_SHIFT default but I guess as far as this patch is concerned,
>> the conclusion is we want to avoid an extra dereference in the fast-path
>> at the cost of a little bit extra space?
>
> Ooh, I just remebered, I've always wanted to apply Linus' runtime-const
> stuff to the futex thing.
I had no clue this existed! Nifty.
>
> Something like the below. But I'm not sure if it actually makes a
> difference these days :/
>
> But that can surely fix up the extra deref.
>
[..snip..]
> @@ -1983,10 +1986,17 @@ static int __init futex_init(void)
> hashsize = max(4, hashsize);
> hashsize = roundup_pow_of_two(hashsize);
> #endif
> - futex_hashshift = ilog2(hashsize);
> + __futex_mask = hashsize - 1;
> + __futex_shift = ilog2(hashsize);
> size = sizeof(struct futex_hash_bucket) * hashsize;
> order = get_order(size);
>
> + void *__futex_queues = &__futex_data.queues;
For __futex_queues, can we instead do:
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 8b58d9035e3a..6cefa0629849 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -48,15 +48,6 @@
#include "futex.h"
#include "../locking/rtmutex_common.h"
-/*
- * The base of the bucket array and its size are always used together
- * (after initialization only in futex_hash()), so ensure that they
- * reside in the same cacheline.
- */
-static struct {
- struct futex_hash_bucket *queues[MAX_NUMNODES];
-} __futex_data __read_mostly __aligned(2*sizeof(long));
-
static u32 __futex_mask;
static u32 __futex_shift;
static struct futex_hash_bucket **__futex_queues;
@@ -1991,12 +1982,14 @@ static int __init futex_init(void)
size = sizeof(struct futex_hash_bucket) * hashsize;
order = get_order(size);
- void *__futex_queues = &__futex_data.queues;
+ __futex_queues = kcalloc(nr_node_ids, sizeof(*__futex_queues), GFP_KERNEL);
runtime_const_init(shift, __futex_shift);
runtime_const_init(mask, __futex_mask);
runtime_const_init(ptr, __futex_queues);
+ BUG_ON(!futex_queues());
+
for_each_node(n) {
struct futex_hash_bucket *table;
---
My machine didn't crash right away on running perf bench futex so I'm
assuming this works?
Sebastian, I haven't found any evidence of static data being interleaved
across NUMA (at least on x86). Since kernel data is identity mapped, is
it even possible for a static array to be allocated interleaved during
boot unless the policy in the BIOS is set to interleaved?
If the static allocation is same as a kcalloc(GFP_KERNEL) from NUMA
standpoint, is the above feasible?
> +
> + runtime_const_init(shift, __futex_shift);
> + runtime_const_init(mask, __futex_mask);
> + runtime_const_init(ptr, __futex_queues);
> +
> for_each_node(n) {
> struct futex_hash_bucket *table;
>
--
Thanks and Regards,
Prateek