Re: [RFC v2 PATCH 0/4] futex: Add support task local hash maps.

From: Waiman Long
Date: Thu Oct 31 2024 - 16:28:28 EST


On 10/31/24 11:56 AM, Sebastian Andrzej Siewior wrote:
On 2024-10-28 13:13:54 [+0100], To linux-kernel@xxxxxxxxxxxxxxx wrote:
Need to do
more testing.
So there is "perf bench futex hash". On a 256 CPU NUMA box:
perf bench futex hash -t 240 -m -s -b $hb
and hb 2 … 131072 (moved the allocation to kvmalloc) I get the following
(averaged over 3 three runs)

buckets op/sec
2 9158.33
4 21665.66 + ~136%
8 44686.66 + ~106
16 84144.33 + ~ 88
32 139998.33 + ~ 66
64 279957.0 + ~ 99
128 509533.0 + ~100
256 1019846.0 + ~100
512 1634940.0 + ~ 60
1024 1834859.33 + ~ 12
1868129.33 (global hash, 65536 hash)
2048 1912071.33 + ~ 4
4096 1918686.66 + ~ 0
8192 1922285.66 + ~ 0
16384 1923017.0 + ~ 0
32768 1923319.0 + ~ 0
65536 1932906.0 + ~ 0
131072 2042571.33 + ~ 5

By doubling the hash size the ops/sec almost double until 256 slots.
After 2048 slots the increase is almost noise (except for the last
entry).

Looking at the performance data, we should probably use the global hash map to maximize throughput if latency isn't important.

AFAICT, the reason why patch 4 creates a local hash map when the first thread is created is to avoid a race of the same futex being hashed on both the local and the global hash maps. Correct me if my understanding is incorrect. So all the multithreaded processes will have to use local hash maps for their private futexes even if they don't care about latency.

Maybe we should limit the auto local hash map creation to only RT processes where latency is important. To avoid the race, we could add a flag to indicate if a private futex hashing operation had ever been done before and prevent the creation of local hash map after that.

My 2 cents.

Cheers,
Longman