[PATCH] futex: fix NUMA node publication race causing missed wakeups
From: Chengfeng Ye
Date: Mon Mar 02 2026 - 22:02:13 EST
get_futex_key() publishes the FUTEX2_NUMA node side word in userspace.
The publication path used a non-atomic read/compute/write sequence, so
concurrent callers could overwrite each other during initialization.
This race can make concurrent operations on the same futex derive
different node values while the NUMA hint is being initialized,
resulting in inconsistent futex keying between wait and wake sides.
In practice this can lead to missed wakeups; at user level, missed
wakeups can manifest as threads waiting indefinitely
(application-level deadlock/hang).
PoC description (see Link below):
- two threads repeatedly exercising FUTEX2_NUMA wait/wake on the
same futex,
- waiter and waker pinned to CPUs from different NUMA nodes,
- waker continuously issuing wake calls while waiter performs
10-second timed waits.
PoC output on unpatched kernel (wake sigal missed and waiter timeout):
- observed on Linux v7.0-rc2 running in qemu-system-x86_64 with
4 vCPUs
Using CPU 0 (waiter) and CPU 2 (waker) from different NUMA nodes
[TRIGGER EVENT #1] iter=38 timed out (futex.node=1)
[TRIGGER EVENT #2] iter=85 timed out (futex.node=1)
[TRIGGER EVENT #3] iter=95 timed out (futex.node=1)
Fix by making node-hint publication publish-once via atomic cmpxchg on
naddr (FUTEX_NO_NODE -> computed node), retrying transient -EAGAIN,
and adopting/validating the winner value on contention.
Fixes: c042c505210d ("futex: Implement FUTEX2_MPOL")
Link: https://gist.github.com/Ychame/d4a5e95401a471f4211a751734b5d164
Signed-off-by: Chengfeng Ye <dg573847474@xxxxxxxxx>
---
kernel/futex/core.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index cf7e610eac42..d45612b36e30 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -596,13 +596,29 @@ int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
if (flags & FLAGS_NUMA) {
u32 __user *naddr = (void *)uaddr + size / 2;
+ u32 old_node;
if (node == FUTEX_NO_NODE) {
node = numa_node_id();
node_updated = true;
}
- if (node_updated && put_user_inline(node, naddr))
- return -EFAULT;
+ if (node_updated) {
+retry_numa_node:
+ err = futex_cmpxchg_value_locked(&old_node, naddr,
+ FUTEX_NO_NODE, (u32)node);
+ if (err == -EAGAIN) {
+ cond_resched();
+ goto retry_numa_node;
+ }
+ if (err)
+ return err;
+ if (old_node != FUTEX_NO_NODE) {
+ node = old_node;
+ if ((unsigned int)node >= MAX_NUMNODES ||
+ !node_possible(node))
+ return -EINVAL;
+ }
+ }
}
key->both.node = node;
--
2.25.1