Re: [PATCH v10 00/21] futex: Add support task local hash maps, FUTEX2_NUMA and FUTEX2_MPOL

From: Shrikanth Hegde
Date: Tue Mar 18 2025 - 09:27:42 EST




On 3/12/25 20:46, Sebastian Andrzej Siewior wrote:
Hi,

this is a follow up on
https://lore.kernel.org/ZwVOMgBMxrw7BU9A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

and adds support for task local futex_hash_bucket.

This is the local hash map series based on v9 extended with PeterZ
FUTEX2_NUMA and FUTEX2_MPOL plus a few fixes on top.

The complete tree is at
https://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git/log/?h=futex_local_v10
https://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git futex_local_v10


Hi Sebastian. Thanks for working on this (along with bringing back FUTEX2 NUMA) which
might help large systems with many futexes.

I tried this in one of our systems(Single NUMA, 80 CPUs), I see significant reduction in futex/hash.
Maybe i am missing some config or doing something stupid w.r.t to benchmarking.
I am trying to understand this stuff.

I ran "perf bench futex all" as is. No change has been made to perf.
=========================================
Without patch: at 6575d1b4a6ef3336608127c704b612bc5e7b0fdc
# Running futex/hash benchmark...
Run summary [PID 45758]: 80 threads, each operating on 1024 [private] futexes for 10 secs.
Averaged 1556023 operations/sec (+- 0.08%), total secs = 10 <<--- 1.5M

=========================================
With the Series: I had to make PR_FUTEX_HASH=78 since 77 is used for TIMERs.

# Running futex/hash benchmark...
Run summary [PID 8644]: 80 threads, each operating on 1024 [private] futexes for 10 secs.
Averaged 150382 operations/sec (+- 0.42%), total secs = 10 <<-- 0.15M, close to 10x down.

=========================================

Did try a git bisect based on the futex/hash numbers. It narrowed it to this one.
first bad commit: [5dc017a816766be47ffabe97b7e5f75919756e5c] futex: Allow automatic allocation of process wide futex hash.

Is this expected given the complexity of hash function change?


Also, is there a benchmark that could be run to evaluate FUTEX2_NUMA, I would like to
try it on multi-NUMA system to see the benefit.