Re: [PATCH] futex: fix NUMA node publication race causing missed wakeups
From: Sebastian Andrzej Siewior
Date: Thu Mar 12 2026 - 05:37:20 EST
On 2026-03-03 03:01:00 [+0000], Chengfeng Ye wrote:
> get_futex_key() publishes the FUTEX2_NUMA node side word in userspace.
> The publication path used a non-atomic read/compute/write sequence, so
> concurrent callers could overwrite each other during initialization.
>
> This race can make concurrent operations on the same futex derive
> different node values while the NUMA hint is being initialized,
> resulting in inconsistent futex keying between wait and wake sides.
> In practice this can lead to missed wakeups; at user level, missed
> wakeups can manifest as threads waiting indefinitely
> (application-level deadlock/hang).
>
> PoC description (see Link below):
> - two threads repeatedly exercising FUTEX2_NUMA wait/wake on the
> same futex,
> - waiter and waker pinned to CPUs from different NUMA nodes,
> - waker continuously issuing wake calls while waiter performs
> 10-second timed waits.
>
> PoC output on unpatched kernel (wake sigal missed and waiter timeout):
> - observed on Linux v7.0-rc2 running in qemu-system-x86_64 with
> 4 vCPUs
> Using CPU 0 (waiter) and CPU 2 (waker) from different NUMA nodes
> [TRIGGER EVENT #1] iter=38 timed out (futex.node=1)
> [TRIGGER EVENT #2] iter=85 timed out (futex.node=1)
> [TRIGGER EVENT #3] iter=95 timed out (futex.node=1)
>
> Fix by making node-hint publication publish-once via atomic cmpxchg on
> naddr (FUTEX_NO_NODE -> computed node), retrying transient -EAGAIN,
> and adopting/validating the winner value on contention.
>
> Fixes: c042c505210d ("futex: Implement FUTEX2_MPOL")
> Link: https://gist.github.com/Ychame/d4a5e95401a471f4211a751734b5d164
> Signed-off-by: Chengfeng Ye <dg573847474@xxxxxxxxx>
I did point out this scenario and it was said that this should not be
done this way. Initialize once and be done with it plus with mpol the
value should be consistent.
I intended to document this and started with the new futex syscalls but
didn't get very far. But the whole PR_FUTEX_HASH thingy is in \o/.
Sebastian