[patch V4 00/14] futex: Address the robust futex unlock race for real
From: Thomas Gleixner
Date: Thu Apr 02 2026 - 11:24:28 EST
This is a follow up to v3 which can be found here:
https://lore.kernel.org/20260330114212.927686587@xxxxxxxxxx
The v1 cover letter contains a detailed analysis of the underlying
problem:
https://lore.kernel.org/20260316162316.356674433@xxxxxxxxxx
TLDR:
The robust futex unlock mechanism is racy in respect to the clearing of the
robust_list_head::list_op_pending pointer because unlock and clearing the
pointer are not atomic. The race window is between the unlock and clearing
the pending op pointer. If the task is forced to exit in this window, exit
will access a potentially invalid pending op pointer when cleaning up the
robust list. That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF. In the
worst case this UAF can lead to memory corruption when unrelated content
has been mapped to the same address by the time the access happens.
User space can't solve this problem without help from the kernel. This
series provides the kernel side infrastructure to help it along:
1) Combined unlock, pointer clearing, wake-up for the contended case
2) VDSO based unlock and pointer clearing helpers with a fix-up function
in the kernel when user space was interrupted within the critical
section.
Both ensure that the pointer clearing happens _before_ a task exits and the
kernel cleans up the robust list during the exit procedure.
Changes since v3:
- s/TOS/TSO/ :)
- Added a barrier() into unsafe_atomic_store_release_user() for the
TSO case. The barrier is not required for the futex unlock
usecase, but it's harmless and ensures that the function can be
safely used in other contexts.
- Fixed up FUTEX op defines
- Prevented a build fail when neither FUTEX_PRIVATE_HASH nor
FUTEX_ROBUST_UNLOCK are enabled
- Fixed a few typos in the documentation
- Picked up the latest version from Andrè's selftests and replaced
the vdso function lookup.
The delta patch against the previous version is below.
The series applies on v7.0-rc3 and is also available via git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git locking-futex-v4
Opens: ptrace based validation test. Sebastian has a working variant which
needs to be integrated properly into the test suite.
Thanks,
tglx
---
diff --git a/Documentation/locking/robust-futex-ABI.rst b/Documentation/locking/robust-futex-ABI.rst
index 0faec175fc26..5e6a0665b8ba 100644
--- a/Documentation/locking/robust-futex-ABI.rst
+++ b/Documentation/locking/robust-futex-ABI.rst
@@ -190,7 +190,7 @@ Robust release is racy
----------------------
The removal of a robust futex from the list is racy when doing it solely in
-userspace. Quoting Thomas Gleixer for the explanation:
+userspace. Quoting Thomas Gleixner for the explanation:
The robust futex unlock mechanism is racy in respect to the clearing of the
robust_list_head::list_op_pending pointer because unlock and clearing the
@@ -202,11 +202,11 @@ userspace. Quoting Thomas Gleixer for the explanation:
worst case this UAF can lead to memory corruption when unrelated content
has been mapped to the same address by the time the access happens.
-A full in dept analysis can be read at
+A full in-depth analysis can be read at
https://lore.kernel.org/lkml/20260316162316.356674433@xxxxxxxxxx/
To overcome that, the kernel needs to participate in the lock release operation.
-This ensures that the release happens "atomically" in the regard of releasing
+This ensures that the release happens "atomically" with regard to releasing
the lock and removing the address from ``list_op_pending``. If the release is
interrupted by a signal, the kernel will also verify if it interrupted the
release operation.
diff --git a/arch/Kconfig b/arch/Kconfig
index c3579449571c..8940fe236394 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -403,8 +403,8 @@ config ARCH_32BIT_OFF_T
config ARCH_32BIT_USTAT_F_TINODE
bool
-# Selected by architectures with Total Store Order (TOS)
-config ARCH_MEMORY_ORDER_TOS
+# Selected by architectures with Total Store Order (TSO)
+config ARCH_MEMORY_ORDER_TSO
bool
config HAVE_ASM_MODVERSIONS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c9b1075a0694..7016aba407e9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -114,7 +114,7 @@ config X86
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_HAVE_EXTRA_ELF_NOTES
- select ARCH_MEMORY_ORDER_TOS
+ select ARCH_MEMORY_ORDER_TSO
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 7522782d8164..ec38e3f342bc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -20,9 +20,9 @@
#include <linux/seqlock.h>
#include <linux/percpu_counter.h>
#include <linux/types.h>
+#include <linux/futex_types.h>
#include <linux/rseq_types.h>
#include <linux/bitmap.h>
-#include <linux/futex_types.h>
#include <asm/mmu.h>
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ac1d9ce1f1ec..1764d13c41c1 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -647,8 +647,10 @@ static inline void user_access_restore(unsigned long flags) { }
#ifndef unsafe_atomic_store_release_user
# define unsafe_atomic_store_release_user(val, uptr, elbl) \
do { \
- if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TOS)) \
+ if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TSO)) \
smp_mb(); \
+ else \
+ barrier(); \
unsafe_put_user(val, uptr, elbl); \
} while (0)
#endif
diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index 9a0f564f1737..aaf86a6b75cc 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -56,11 +56,11 @@
#define FUTEX_UNLOCK_PI_LIST32_PRIVATE (FUTEX_UNLOCK_PI_LIST32 | FUTEX_PRIVATE_FLAG)
#define FUTEX_UNLOCK_WAKE_LIST64 (FUTEX_WAKE | FUTEX_UNLOCK_ROBUST)
-#define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE (FUTEX_UNLOCK_LIST64 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE (FUTEX_UNLOCK_WAKE_LIST64 | FUTEX_PRIVATE_FLAG)
#define FUTEX_UNLOCK_WAKE_LIST32 (FUTEX_WAKE | FUTEX_UNLOCK_ROBUST | \
FUTEX_ROBUST_LIST32)
-#define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE (FUTEX_UNLOCK_LIST32 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE (FUTEX_UNLOCK_WAKE_LIST32 | FUTEX_PRIVATE_FLAG)
#define FUTEX_UNLOCK_BITSET_LIST64 (FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST)
#define FUTEX_UNLOCK_BITSET_LIST64_PRIVATE (FUTEX_UNLOCK_BITSET_LIST64 | FUTEX_PRIVATE_FLAG)
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ce47d02f1ea2..0d5af8f738f3 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1931,7 +1931,7 @@ static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
if (new) {
/*
- * Will set mm->futex.phash_new on failure;
+ * Will set mm->futex.phash.new_hash on failure;
* futex_private_hash_get() will try again.
*/
if (!__futex_pivot_hash(mm, new) && custom)
@@ -2019,11 +2019,13 @@ static void futex_robust_unlock_init_mm(struct futex_mm_data *fd)
static inline void futex_robust_unlock_init_mm(struct futex_mm_data *fd) { }
#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+#if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
void futex_mm_init(struct mm_struct *mm)
{
futex_hash_init_mm(&mm->futex);
futex_robust_unlock_init_mm(&mm->futex);
}
+#endif
int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4)
{
diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
index 62f21f8d89a6..43059f6dbc40 100644
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -31,6 +31,7 @@
#include <errno.h>
#include <pthread.h>
#include <signal.h>
+#include <stdint.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <stddef.h>
@@ -44,6 +45,10 @@
#define SLEEP_US 100
+#if UINTPTR_MAX == 0xffffffffffffffff
+# define BUILD_64
+#endif
+
static pthread_barrier_t barrier, barrier2;
static int set_robust_list(struct robust_list_head *head, size_t len)
@@ -564,28 +569,20 @@ TEST(test_circular_list)
*/
/*
- * Auxiliary code for loading the vDSO functions
+ * Auxiliary code for binding the vDSO functions
*/
-#define VDSO_SIZE 0x4000
-
-void *get_vdso_func_addr(const char *str)
+static void *get_vdso_func_addr(const char *function)
{
- void *vdso_base = (void *) getauxval(AT_SYSINFO_EHDR), *addr;
- Dl_info info;
+ const char *vdso_names[] = {
+ "linux-vdso.so.1", "linux-gate.so.1", "linux-vdso32.so.1", "linux-vdso64.so.1",
+ };
- if (!vdso_base) {
- perror("Error to get AT_SYSINFO_EHDR");
- return NULL;
- }
+ for (int i = 0; i < ARRAY_SIZE(vdso_names); i++) {
+ void *vdso = dlopen(vdso_names[i], RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
- for (addr = vdso_base; addr < vdso_base + VDSO_SIZE; addr += sizeof(addr)) {
- if (dladdr(addr, &info) == 0 || !info.dli_sname)
- continue;
-
- if (!strcmp(info.dli_sname, str))
- return info.dli_saddr;
+ if (vdso)
+ return dlsym(vdso, function);
}
-
return NULL;
}
@@ -611,9 +608,6 @@ FIXTURE_VARIANT(vdso_unlock)
FIXTURE_SETUP(vdso_unlock)
{
self->vdso = get_vdso_func_addr(variant->func_name);
-
- if (!self->vdso)
- ksft_test_result_skip("%s not found\n", variant->func_name);
}
FIXTURE_TEARDOWN(vdso_unlock) {}
@@ -640,10 +634,15 @@ TEST_F(vdso_unlock, test_robust_try_unlock_uncontended)
struct lock_struct lock = { .futex = 0 };
_Atomic(unsigned int) *futex = &lock.futex;
struct robust_list_head head;
- uint64_t exp = (uint64_t) NULL;
+ uintptr_t exp = (uintptr_t) NULL;
pid_t tid = gettid();
int ret;
+ if (!self->vdso) {
+ ksft_test_result_skip("%s not found\n", variant->func_name);
+ return;
+ }
+
*futex = tid;
ret = set_list(&head);
@@ -659,11 +658,11 @@ TEST_F(vdso_unlock, test_robust_try_unlock_uncontended)
/* Check only the lower 32 bits for the 32-bit entry point */
if (variant->is_32) {
- exp = (uint64_t)(unsigned long)&lock.list;
+ exp = (uintptr_t)(unsigned long)&lock.list;
exp &= ~0xFFFFFFFFULL;
}
- ASSERT_EQ((uint64_t)(unsigned long)head.list_op_pending, exp);
+ ASSERT_EQ((uintptr_t)(unsigned long)head.list_op_pending, exp);
}
/*
@@ -679,6 +678,11 @@ TEST_F(vdso_unlock, test_robust_try_unlock_contended)
pid_t tid = gettid();
int ret;
+ if (!self->vdso) {
+ ksft_test_result_skip("%s not found\n", variant->func_name);
+ return;
+ }
+
*futex = tid | FUTEX_WAITERS;
ret = set_list(&head);
@@ -724,6 +728,24 @@ FIXTURE_VARIANT_ADD(futex_op, unlock_pi)
.val3 = 0,
};
+FIXTURE_VARIANT_ADD(futex_op, wake32)
+{
+ .op = FUTEX_WAKE | FUTEX_ROBUST_LIST32,
+ .val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake_bitset32)
+{
+ .op = FUTEX_WAKE_BITSET | FUTEX_ROBUST_LIST32,
+ .val3 = FUTEX_BITSET_MATCH_ANY,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, unlock_pi32)
+{
+ .op = FUTEX_UNLOCK_PI | FUTEX_ROBUST_LIST32,
+ .val3 = 0,
+};
+
/*
* The syscall should return the number of tasks waken (for this test, 0), clear the futex word and
* clear list_op_pending
@@ -732,10 +754,18 @@ TEST_F(futex_op, test_futex_robust_unlock)
{
struct lock_struct lock = { .futex = 0 };
_Atomic(unsigned int) *futex = &lock.futex;
+ uintptr_t exp = (uintptr_t) NULL;
struct robust_list_head head;
pid_t tid = gettid();
int ret;
+#ifndef BUILD_64
+ if (!(variant->op & FUTEX_ROBUST_LIST32)) {
+ ksft_test_result_skip("Not supported for 32 bit build\n");
+ return;
+ }
+#endif
+
*futex = tid | FUTEX_WAITERS;
ret = set_list(&head);
@@ -749,7 +779,13 @@ TEST_F(futex_op, test_futex_robust_unlock)
ASSERT_EQ(ret, 0);
ASSERT_EQ(*futex, 0);
- ASSERT_EQ(head.list_op_pending, NULL);
+
+ if (variant->op & FUTEX_ROBUST_LIST32) {
+ exp = (uint64_t)(unsigned long)&lock.list;
+ exp &= ~0xFFFFFFFFULL;
+ }
+
+ ASSERT_EQ((uintptr_t)(unsigned long)head.list_op_pending, exp);
}
TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h
index f4d880b8e795..df33f31d6994 100644
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -41,6 +41,9 @@ typedef volatile u_int32_t futex_t;
#ifndef FUTEX_ROBUST_UNLOCK
#define FUTEX_ROBUST_UNLOCK 512
#endif
+#ifndef FUTEX_ROBUST_LIST32
+#define FUTEX_ROBUST_LIST32 1024
+#endif
#ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)