Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
From: Breno Leitao
Date: Wed Jun 10 2026 - 10:03:53 EST
On Wed, Jun 10, 2026 at 01:25:46PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 10, 2026 at 01:22:34PM +0200, Thomas Gleixner wrote:
> > On Tue, Jun 09 2026 at 22:18, Peter Zijlstra wrote:
> > > On Tue, Jun 09, 2026 at 10:11:17PM +0200, Peter Zijlstra wrote:
> > >> Anyway, how does something like the below work for you? It's a total
> > >> hack job, but it (sorta) builds and runs.
> > >>
> > >
> > > Please use this one, I spotted a silly bug.
> >
> > So I ran this on two machines.
> >
> > SKL dual socket 112 threads:
> >
> > Baseline Patched
> >
> > shared (16k) 1571857 1641435 + 4.4%
> > autosize (512) 646390 903371 +39.7%
> > -b 256 464395 587014 +26.4%
> > -b 512 715687 995943 +39.2%
> > -b 1024 995085 1396328 +40.3%
> > -b 2048 1293114 1668395 +29.0%
> > -b 4096 2124438 2240228 + 5.5%
> >
> > Zen3 dual socket 256 threads:
> >
> > Baseline Patched
> >
> > shared (16k) 1275840 1381279 + 8.2%
> > autosize (512) 1252745 1482179 +18.3%
> > -b 256 856274 955455 +11.5%
> > -b 512 1267490 1544010 +21.8%
> > -b 1024 1424013 1625424 +14.1%
> > -b 2048 1505181 1669342 +10.9%
> > -b 4096 1465993 1688932 +15.2%
>
> I suppose that means I'd better go make it prettier and survive
> randconfig :-)
I've Peter it here on the same machine I used earlier 176-thread AMD EPYC host,
10s perf bench futex hash per run, baseline = parent commit (acb7500801e98):
Baseline Patched Delta
shared (16 buckets) 1,230,599 1,368,655 +11.2%
autosize (1024) 1,285,440 1,556,946 +21.1%
-b 256 1,341,471 1,520,303 +13.3%
-b 512 1,438,330 1,599,319 +11.2%
-b 1024 1,443,772 1,622,493 +12.4%
-b 2048 1,472,108 1,643,975 +11.7%
-b 4096 1,333,098 1,570,897 +17.8%
Stderr was 0.06%-0.22% across the board, so the deltas are well
outside noise.
The trade Peter sketched holds up here: no extra futex memory
cost, and we still recover most of what padding the bucket would
have bought.
Really good, thanks for your this patch,
--breno