[patch V3 00/14] futex: Address the robust futex unlock race for real

From: Thomas Gleixner

Date: Mon Mar 30 2026 - 08:07:33 EST


This is a follow up to v2 which can be found here:

https://lore.kernel.org/20260319225224.853416463@xxxxxxxxxx

The v1 cover letter contains a detailed analysis of the underlying
problem:

https://lore.kernel.org/20260316162316.356674433@xxxxxxxxxx

TLDR:

The robust futex unlock mechanism is racy in respect to the clearing of the
robust_list_head::list_op_pending pointer because unlock and clearing the
pointer are not atomic. The race window is between the unlock and clearing
the pending op pointer. If the task is forced to exit in this window, exit
will access a potentially invalid pending op pointer when cleaning up the
robust list. That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF. In the
worst case this UAF can lead to memory corruption when unrelated content
has been mapped to the same address by the time the access happens.

User space can't solve this problem without help from the kernel. This
series provides the kernel side infrastructure to help it along:

1) Combined unlock, pointer clearing, wake-up for the contended case

2) VDSO based unlock and pointer clearing helpers with a fix-up function
in the kernel when user space was interrupted within the critical
section.

Both ensure that the pointer clearing happens _before_ a task exits and the
kernel cleans up the robust list during the exit procedure.

Changes since v2:

- Retain the critical section ranges on fork() - Sebastian

- Simplify the region update and provide generic helpers for that to
avoid copy and pasta in the architecture VDSO/VMA code.

- Consolidate the naming: __vdso_futex_robust_list64_try_unlock() and
__vdso_futex_robust_list32_try_unlock() as there is no need to make it
different for the 32-bit VDSO, which only supports the list32 variant.

- Use 'r' constraint in the ASM template - Uros

- Rename ARCH_STORE_IMPLIES_RELEASE to ARCH_MEMORY_ORDER_TOS - Peter

- Save space in the ranges array by using start_ip + len instead of
start_ip/end_ip - Sebastian

- Reduce number of ranges to 1 if COMPAT is disabled - Peter

- Seperate the private hash and unlock data into their own structs to
make fork/exec handling simpler

- Make futex_mm_init() void as it cannot fail

- Invalidate critical section ranges by setting start_ip to ~0UL so that
the quick check in the signal path drops out on the first compare as
with 0 it always has to evaluate both conditions.

- Picked up the documentation and selftest patches from Andrè

- Addressed various review comments

- Picked up tags as appropriate

Thanks to everyone for feedback and discussion!

The delta patch against the previous version is below.

The series applies on v7.0-rc3 and is also available via git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git locking-futex-v3

Opens: ptrace based validation test. Sebastian has a working prototype.

Thanks,

tglx
---

diff --git a/Documentation/locking/robust-futex-ABI.rst b/Documentation/locking/robust-futex-ABI.rst
index f24904f1c16f..0faec175fc26 100644
--- a/Documentation/locking/robust-futex-ABI.rst
+++ b/Documentation/locking/robust-futex-ABI.rst
@@ -153,6 +153,9 @@ On removal:
3) release the futex lock, and
4) clear the 'lock_op_pending' word.

+Please note that the removal of a robust futex purely in userspace is
+racy. Refer to the next chapter to learn more and how to avoid this.
+
On exit, the kernel will consider the address stored in
'list_op_pending' and the address of each 'lock word' found by walking
the list starting at 'head'. For each such address, if the bottom 30
@@ -182,3 +185,44 @@ any point:
When the kernel sees a list entry whose 'lock word' doesn't have the
current threads TID in the lower 30 bits, it does nothing with that
entry, and goes on to the next entry.
+
+Robust release is racy
+----------------------
+
+The removal of a robust futex from the list is racy when doing it solely in
+userspace. Quoting Thomas Gleixer for the explanation:
+
+ The robust futex unlock mechanism is racy in respect to the clearing of the
+ robust_list_head::list_op_pending pointer because unlock and clearing the
+ pointer are not atomic. The race window is between the unlock and clearing
+ the pending op pointer. If the task is forced to exit in this window, exit
+ will access a potentially invalid pending op pointer when cleaning up the
+ robust list. That happens if another task manages to unmap the object
+ containing the lock before the cleanup, which results in an UAF. In the
+ worst case this UAF can lead to memory corruption when unrelated content
+ has been mapped to the same address by the time the access happens.
+
+A full in dept analysis can be read at
+https://lore.kernel.org/lkml/20260316162316.356674433@xxxxxxxxxx/
+
+To overcome that, the kernel needs to participate in the lock release operation.
+This ensures that the release happens "atomically" in the regard of releasing
+the lock and removing the address from ``list_op_pending``. If the release is
+interrupted by a signal, the kernel will also verify if it interrupted the
+release operation.
+
+For the contended unlock case, where other threads are waiting for the lock
+release, there's the ``FUTEX_ROBUST_UNLOCK`` operation feature flag for the
+``futex()`` system call, which must be used with one of the following
+operations: ``FUTEX_WAKE``, ``FUTEX_WAKE_BITSET`` or ``FUTEX_UNLOCK_PI``.
+The kernel will release the lock (set the futex word to zero), clean the
+``list_op_pending`` field. Then, it will proceed with the normal wake path.
+
+For the non-contended path, there's still a race between checking the futex word
+and clearing the ``list_op_pending`` field. To solve this without the need of a
+complete system call, userspace should call the virtual syscall
+``__vdso_futex_robust_listXX_try_unlock()`` (where XX is either 32 or 64,
+depending on the size of the pointer). If the vDSO call succeeds, it means that
+it released the lock and cleared ``list_op_pending``. If it fails, that means
+that there are waiters for this lock and a call to ``futex()`` syscall with
+``FUTEX_ROBUST_UNLOCK`` is needed.
diff --git a/arch/Kconfig b/arch/Kconfig
index 0c1e6cc101ff..c3579449571c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -403,8 +403,8 @@ config ARCH_32BIT_OFF_T
config ARCH_32BIT_USTAT_F_TINODE
bool

-# Selected by architectures when plain stores have release semantics
-config ARCH_STORE_IMPLIES_RELEASE
+# Selected by architectures with Total Store Order (TOS)
+config ARCH_MEMORY_ORDER_TOS
bool

config HAVE_ASM_MODVERSIONS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e9437efae787..c9b1075a0694 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -114,12 +114,12 @@ config X86
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_HAVE_EXTRA_ELF_NOTES
+ select ARCH_MEMORY_ORDER_TOS
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_STACKWALK
- select ARCH_STORE_IMPLIES_RELEASE
select ARCH_SUPPORTS_ACPI
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c
index 8df8fd6c759d..dba54745b355 100644
--- a/arch/x86/entry/vdso/common/vfutex.c
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -25,32 +25,32 @@
#define __stringify_1(x...) #x
#define __stringify(x...) __stringify_1(x)

-#define LABEL(name, which) __stringify(name##_futex_try_unlock_cs_##which:)
+#define LABEL(prefix, which) __stringify(prefix##_try_unlock_cs_##which:)

-#define JNZ_END(name) "jnz " __stringify(name) "_futex_try_unlock_cs_end\n"
+#define JNZ_END(prefix) "jnz " __stringify(prefix) "_try_unlock_cs_end\n"

#define CLEAR_POPQ "movq %[zero], %a[pop]\n"
#define CLEAR_POPL "movl %k[zero], %a[pop]\n"

-#define futex_robust_try_unlock(name, clear_pop, __lock, __tid, __pop) \
+#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop) \
({ \
asm volatile ( \
" \n" \
" lock cmpxchgl %k[zero], %a[lock] \n" \
" \n" \
- LABEL(name, start) \
+ LABEL(prefix, start) \
" \n" \
- JNZ_END(name) \
+ JNZ_END(prefix) \
" \n" \
- LABEL(name, success) \
+ LABEL(prefix, success) \
" \n" \
clear_pop \
" \n" \
- LABEL(name, end) \
+ LABEL(prefix, end) \
: [tid] "+&a" (__tid) \
: [lock] "D" (__lock), \
[pop] "d" (__pop), \
- [zero] "S" (0UL) \
+ [zero] "r" (0UL) \
: "memory" \
); \
__tid; \
@@ -59,18 +59,13 @@
#ifdef __x86_64__
__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
{
- return futex_robust_try_unlock(x86_64, CLEAR_POPQ, lock, tid, pop);
+ return futex_robust_try_unlock(__futex_list64, CLEAR_POPQ, lock, tid, pop);
}
+#endif /* __x86_64__ */

-#ifdef CONFIG_COMPAT
+#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
{
- return futex_robust_try_unlock(x86_64_compat, CLEAR_POPL, lock, tid, pop);
+ return futex_robust_try_unlock(__futex_list32, CLEAR_POPL, lock, tid, pop);
}
-#endif /* CONFIG_COMPAT */
-#else /* __x86_64__ */
-__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
-{
- return futex_robust_try_unlock(x86_32, CLEAR_POPL, lock, tid, pop);
-}
-#endif /* !__x86_64__ */
+#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
diff --git a/arch/x86/entry/vdso/vdso64/vdso64.lds.S b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
index 11dae35358a2..4a72122da81b 100644
--- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
@@ -32,9 +32,12 @@ VERSION {
#endif
getrandom;
__vdso_getrandom;
+
#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
__vdso_futex_robust_list64_try_unlock;
+#ifdef CONFIG_COMPAT
__vdso_futex_robust_list32_try_unlock;
+#endif
#endif
local: *;
};
diff --git a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
index 0e844af63304..b917dc69f62f 100644
--- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
@@ -22,9 +22,12 @@ VERSION {
__vdso_getcpu;
__vdso_time;
__vdso_clock_getres;
+
#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
__vdso_futex_robust_list64_try_unlock;
+#ifdef CONFIG_COMPAT
__vdso_futex_robust_list32_try_unlock;
+#endif
#endif
local: *;
};
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index ad87818d42a0..357e18db0c7a 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -6,6 +6,7 @@
*/
#include <linux/mm.h>
#include <linux/err.h>
+#include <linux/futex.h>
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
#include <linux/slab.h>
@@ -79,26 +80,19 @@ static void vdso_futex_robust_unlock_update_ips(void)
const struct vdso_image *image = current->mm->context.vdso_image;
unsigned long vdso = (unsigned long) current->mm->context.vdso;
struct futex_mm_data *fd = &current->mm->futex;
- struct futex_unlock_cs_range *csr = fd->unlock_cs_ranges;
+ unsigned int idx = 0;
+
+ futex_reset_cs_ranges(fd);

- fd->unlock_cs_num_ranges = 0;
#ifdef CONFIG_X86_64
- if (image->sym_x86_64_futex_try_unlock_cs_start) {
- csr->start_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_start;
- csr->end_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_end;
- csr->pop_size32 = 0;
- csr++;
- fd->unlock_cs_num_ranges++;
- }
+ futex_set_vdso_cs_range(fd, idx, vdso, image->sym___futex_list64_try_unlock_cs_start,
+ image->sym___futex_list64_try_unlock_cs_end, false);
+ idx++;
#endif /* CONFIG_X86_64 */

#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
- if (image->sym_x86_32_futex_try_unlock_cs_start) {
- csr->start_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_start;
- csr->end_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_end;
- csr->pop_size32 = 1;
- fd->unlock_cs_num_ranges++;
- }
+ futex_set_vdso_cs_range(fd, idx, vdso, image->sym___futex_list32_try_unlock_cs_start,
+ image->sym___futex_list32_try_unlock_cs_end, true);
#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
}
#else
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index b96a6f04d677..68cf5cdd84b4 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -25,12 +25,10 @@ struct vdso_image {
long sym_int80_landing_pad;
long sym_vdso32_sigreturn_landing_pad;
long sym_vdso32_rt_sigreturn_landing_pad;
- long sym_x86_64_futex_try_unlock_cs_start;
- long sym_x86_64_futex_try_unlock_cs_end;
- long sym_x86_64_compat_futex_try_unlock_cs_start;
- long sym_x86_64_compat_futex_try_unlock_cs_end;
- long sym_x86_32_futex_try_unlock_cs_start;
- long sym_x86_32_futex_try_unlock_cs_end;
+ long sym___futex_list64_try_unlock_cs_start;
+ long sym___futex_list64_try_unlock_cs_end;
+ long sym___futex_list32_try_unlock_cs_start;
+ long sym___futex_list32_try_unlock_cs_end;
};

extern const struct vdso_image vdso64_image;
diff --git a/arch/x86/tools/vdso2c.c b/arch/x86/tools/vdso2c.c
index 2d01e511ca8a..921576b6a5f5 100644
--- a/arch/x86/tools/vdso2c.c
+++ b/arch/x86/tools/vdso2c.c
@@ -82,12 +82,10 @@ struct vdso_sym required_syms[] = {
{"int80_landing_pad", true},
{"vdso32_rt_sigreturn_landing_pad", true},
{"vdso32_sigreturn_landing_pad", true},
- {"x86_64_futex_try_unlock_cs_start", true},
- {"x86_64_futex_try_unlock_cs_end", true},
- {"x86_64_compat_futex_try_unlock_cs_start", true},
- {"x86_64_compat_futex_try_unlock_cs_end", true},
- {"x86_32_futex_try_unlock_cs_start", true},
- {"x86_32_futex_try_unlock_cs_end", true},
+ {"__futex_list64_try_unlock_cs_start", true},
+ {"__futex_list64_try_unlock_cs_end", true},
+ {"__futex_list32_try_unlock_cs_start", true},
+ {"__futex_list32_try_unlock_cs_end", true},
};

__attribute__((format(printf, 1, 2))) __attribute__((noreturn))
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 8e3d46737b03..33524dfb3fe4 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -81,22 +81,18 @@ int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4)
#ifdef CONFIG_FUTEX_PRIVATE_HASH
int futex_hash_allocate_default(void);
void futex_hash_free(struct mm_struct *mm);
-int futex_mm_init(struct mm_struct *mm);
-
-#else /* !CONFIG_FUTEX_PRIVATE_HASH */
+#else /* CONFIG_FUTEX_PRIVATE_HASH */
static inline int futex_hash_allocate_default(void) { return 0; }
static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
-#endif /* CONFIG_FUTEX_PRIVATE_HASH */
+#endif /* !CONFIG_FUTEX_PRIVATE_HASH */

-#else /* !CONFIG_FUTEX */
+#else /* CONFIG_FUTEX */
static inline void futex_init_task(struct task_struct *tsk) { }
static inline void futex_exit_recursive(struct task_struct *tsk) { }
static inline void futex_exit_release(struct task_struct *tsk) { }
static inline void futex_exec_release(struct task_struct *tsk) { }
-static inline long do_futex(u32 __user *uaddr, int op, u32 val,
- ktime_t *timeout, u32 __user *uaddr2,
- u32 val2, u32 val3)
+static inline long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
+ u32 __user *uaddr2, u32 val2, u32 val3)
{
return -EINVAL;
}
@@ -104,17 +100,14 @@ static inline int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsig
{
return -EINVAL;
}
-static inline int futex_hash_allocate_default(void)
-{
- return 0;
-}
+static inline int futex_hash_allocate_default(void) { return 0; }
static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
#endif /* !CONFIG_FUTEX */

#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
#include <asm/futex_robust.h>

+void futex_reset_cs_ranges(struct futex_mm_data *fd);
void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr);

static inline bool futex_within_robust_unlock(struct pt_regs *regs,
@@ -122,7 +115,7 @@ static inline bool futex_within_robust_unlock(struct pt_regs *regs,
{
unsigned long ip = instruction_pointer(regs);

- return ip >= csr->start_ip && ip < csr->end_ip;
+ return ip >= csr->start_ip && ip < csr->start_ip + csr->len;
}

static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
@@ -131,26 +124,40 @@ static inline void futex_fixup_robust_unlock(struct pt_regs *regs)

/*
* Avoid dereferencing current->mm if not returning from interrupt.
- * current->rseq.event is going to be used anyway in the exit to user
- * code, so bringing it in is not a big deal.
+ * current->rseq.event is going to be used subsequently, so bringing the
+ * cache line in is not a big deal.
*/
if (!current->rseq.event.user_irq)
return;

- csr = current->mm->futex.unlock_cs_ranges;
- if (unlikely(futex_within_robust_unlock(regs, csr))) {
- __futex_fixup_robust_unlock(regs, csr);
- return;
- }
+ csr = current->mm->futex.unlock.cs_ranges;

- /* Multi sized robust lists are only supported with CONFIG_COMPAT */
- if (IS_ENABLED(CONFIG_COMPAT) && current->mm->futex.unlock_cs_num_ranges == 2) {
- if (unlikely(futex_within_robust_unlock(regs, ++csr)))
+ /* The loop is optimized out for !COMPAT */
+ for (int r = 0; r < FUTEX_ROBUST_MAX_CS_RANGES; r++, csr++) {
+ if (unlikely(futex_within_robust_unlock(regs, csr))) {
__futex_fixup_robust_unlock(regs, csr);
+ return;
+ }
}
}
+
+static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned int idx,
+ unsigned long vdso, unsigned long start,
+ unsigned long end, bool sz32)
+{
+ fd->unlock.cs_ranges[idx].start_ip = vdso + start;
+ fd->unlock.cs_ranges[idx].len = end - start;
+ fd->unlock.cs_ranges[idx].pop_size32 = sz32;
+}
#else /* CONFIG_FUTEX_ROBUST_UNLOCK */
-static inline void futex_fixup_robust_unlock(struct pt_regs *regs) {}
+static inline void futex_fixup_robust_unlock(struct pt_regs *regs) { }
#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */

+
+#if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
+void futex_mm_init(struct mm_struct *mm);
+#else
+static inline void futex_mm_init(struct mm_struct *mm) { }
#endif
+
+#endif /* _LINUX_FUTEX_H */
diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
index 90e24a10ed08..288666fb37b6 100644
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -30,51 +30,66 @@ struct futex_sched_data {
unsigned int state;
};

+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+/**
+ * struct futex_mm_phash - Futex private hash related per MM data
+ * @lock: Mutex to protect the private hash operations
+ * @hash: RCU managed pointer to the private hash
+ * @hash_new: Pointer to a newly allocated private hash
+ * @batches: Batch state for RCU synchronization
+ * @rcu: RCU head for call_rcu()
+ * @atomic: Aggregate value for @hash_ref
+ * @ref: Per CPU reference counter for a private hash
+ */
+struct futex_mm_phash {
+ struct mutex lock;
+ struct futex_private_hash __rcu *hash;
+ struct futex_private_hash *hash_new;
+ unsigned long batches;
+ struct rcu_head rcu;
+ atomic_long_t atomic;
+ unsigned int __percpu *ref;
+};
+#else /* CONFIG_FUTEX_ROBUST_UNLOCK */
+struct futex_mm_phash { };
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
/**
* struct futex_unlock_cs_range - Range for the VDSO unlock critical section
* @start_ip: The start IP of the robust futex unlock critical section (inclusive)
- * @end_ip: The end IP of the robust futex unlock critical section (exclusive)
+ * @len: The length of the robust futex unlock critical section
* @pop_size32: Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit
*/
struct futex_unlock_cs_range {
unsigned long start_ip;
- unsigned long end_ip;
+ unsigned int len;
unsigned int pop_size32;
};

-#define FUTEX_ROBUST_MAX_CS_RANGES 2
+#define FUTEX_ROBUST_MAX_CS_RANGES (1 + IS_ENABLED(CONFIG_COMPAT))
+
+/**
+ * struct futex_unlock_cs_ranges - Futex unlock VSDO critical sections
+ * @cs_ranges: Array of critical section ranges
+ */
+struct futex_unlock_cs_ranges {
+ struct futex_unlock_cs_range cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES];
+};
+#else /* CONFIG_FUTEX_ROBUST_UNLOCK */
+struct futex_unlock_cs_ranges { };
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */

/**
* struct futex_mm_data - Futex related per MM data
- * @phash_lock: Mutex to protect the private hash operations
- * @phash: RCU managed pointer to the private hash
- * @phash_new: Pointer to a newly allocated private hash
- * @phash_batches: Batch state for RCU synchronization
- * @phash_rcu: RCU head for call_rcu()
- * @phash_atomic: Aggregate value for @phash_ref
- * @phash_ref: Per CPU reference counter for a private hash
- *
- * @unlock_cs_num_ranges: The number of critical section ranges for VDSO assisted unlock
- * of robust futexes.
- * @unlock_cs_ranges: The critical section ranges for VDSO assisted unlock
+ * @phash: Futex private hash related data
+ * @unlock: Futex unlock VDSO critical sections
*/
struct futex_mm_data {
-#ifdef CONFIG_FUTEX_PRIVATE_HASH
- struct mutex phash_lock;
- struct futex_private_hash __rcu *phash;
- struct futex_private_hash *phash_new;
- unsigned long phash_batches;
- struct rcu_head phash_rcu;
- atomic_long_t phash_atomic;
- unsigned int __percpu *phash_ref;
-#endif
-#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
- unsigned int unlock_cs_num_ranges;
- struct futex_unlock_cs_range unlock_cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES];
-#endif
+ struct futex_mm_phash phash;
+ struct futex_unlock_cs_ranges unlock;
};
-
-#else
+#else /* CONFIG_FUTEX */
struct futex_sched_data { };
struct futex_mm_data { };
#endif /* !CONFIG_FUTEX */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index bc41d619f9a3..ac1d9ce1f1ec 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -645,11 +645,11 @@ static inline void user_access_restore(unsigned long flags) { }
#endif

#ifndef unsafe_atomic_store_release_user
-# define unsafe_atomic_store_release_user(val, uptr, elbl) \
- do { \
- if (!IS_ENABLED(CONFIG_ARCH_STORE_IMPLIES_RELEASE)) \
- smp_mb(); \
- unsafe_put_user(val, uptr, elbl); \
+# define unsafe_atomic_store_release_user(val, uptr, elbl) \
+ do { \
+ if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TOS)) \
+ smp_mb(); \
+ unsafe_put_user(val, uptr, elbl); \
} while (0)
#endif

diff --git a/kernel/fork.c b/kernel/fork.c
index 65113a304518..726c9427d811 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1097,6 +1097,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
#endif
mm_init_uprobes_state(mm);
hugetlb_count_init(mm);
+ futex_mm_init(mm);

mm_flags_clear_all(mm);
if (current->mm) {
@@ -1109,11 +1110,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->def_flags = 0;
}

- if (futex_mm_init(mm))
- goto fail_mm_init;
-
if (mm_alloc_pgd(mm))
- goto fail_nopgd;
+ goto fail_mm_init;

if (mm_alloc_id(mm))
goto fail_noid;
@@ -1140,8 +1138,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm_free_id(mm);
fail_noid:
mm_free_pgd(mm);
-fail_nopgd:
- futex_hash_free(mm);
fail_mm_init:
free_mm(mm);
return NULL;
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 6a9c04471c44..ce47d02f1ea2 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -190,7 +190,7 @@ __futex_hash_private(union futex_key *key, struct futex_private_hash *fph)
return NULL;

if (!fph)
- fph = rcu_dereference(key->private.mm->futex.phash);
+ fph = rcu_dereference(key->private.mm->futex.phash.hash);
if (!fph || !fph->hash_mask)
return NULL;

@@ -235,17 +235,17 @@ static void futex_rehash_private(struct futex_private_hash *old,
}
}

-static bool __futex_pivot_hash(struct mm_struct *mm,
- struct futex_private_hash *new)
+static bool __futex_pivot_hash(struct mm_struct *mm, struct futex_private_hash *new)
{
+ struct futex_mm_phash *mmph = &mm->futex.phash;
struct futex_private_hash *fph;

- WARN_ON_ONCE(mm->futex.phash_new);
+ WARN_ON_ONCE(mmph->hash_new);

- fph = rcu_dereference_protected(mm->futex.phash, lockdep_is_held(&mm->futex.phash_lock));
+ fph = rcu_dereference_protected(mmph->hash, lockdep_is_held(&mmph->lock));
if (fph) {
if (!futex_ref_is_dead(fph)) {
- mm->futex.phash_new = new;
+ mmph->hash_new = new;
return false;
}

@@ -253,8 +253,8 @@ static bool __futex_pivot_hash(struct mm_struct *mm,
}
new->state = FR_PERCPU;
scoped_guard(rcu) {
- mm->futex.phash_batches = get_state_synchronize_rcu();
- rcu_assign_pointer(mm->futex.phash, new);
+ mmph->batches = get_state_synchronize_rcu();
+ rcu_assign_pointer(mmph->hash, new);
}
kvfree_rcu(fph, rcu);
return true;
@@ -262,12 +262,12 @@ static bool __futex_pivot_hash(struct mm_struct *mm,

static void futex_pivot_hash(struct mm_struct *mm)
{
- scoped_guard(mutex, &mm->futex.phash_lock) {
+ scoped_guard(mutex, &mm->futex.phash.lock) {
struct futex_private_hash *fph;

- fph = mm->futex.phash_new;
+ fph = mm->futex.phash.hash_new;
if (fph) {
- mm->futex.phash_new = NULL;
+ mm->futex.phash.hash_new = NULL;
__futex_pivot_hash(mm, fph);
}
}
@@ -290,7 +290,7 @@ struct futex_private_hash *futex_private_hash(void)
scoped_guard(rcu) {
struct futex_private_hash *fph;

- fph = rcu_dereference(mm->futex.phash);
+ fph = rcu_dereference(mm->futex.phash.hash);
if (!fph)
return NULL;

@@ -1452,12 +1452,16 @@ bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags)
#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr)
{
+ /*
+ * arch_futex_robust_unlock_get_pop() returns the list pending op pointer from
+ * @regs if the try_cmpxchg() succeeded.
+ */
void __user *pop = arch_futex_robust_unlock_get_pop(regs);

if (!pop)
return;

- futex_robust_list_clear_pending(pop, csr->cs_pop_size32 ? FLAGS_ROBUST_LIST32 : 0);
+ futex_robust_list_clear_pending(pop, csr->pop_size32 ? FLAGS_ROBUST_LIST32 : 0);
}
#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */

@@ -1606,17 +1610,17 @@ static void __futex_ref_atomic_begin(struct futex_private_hash *fph)
* otherwise it would be impossible for it to have reported success
* from futex_ref_is_dead().
*/
- WARN_ON_ONCE(atomic_long_read(&mm->futex.phash_atomic) != 0);
+ WARN_ON_ONCE(atomic_long_read(&mm->futex.phash.atomic) != 0);

/*
* Set the atomic to the bias value such that futex_ref_{get,put}()
* will never observe 0. Will be fixed up in __futex_ref_atomic_end()
* when folding in the percpu count.
*/
- atomic_long_set(&mm->futex.phash_atomic, LONG_MAX);
+ atomic_long_set(&mm->futex.phash.atomic, LONG_MAX);
smp_store_release(&fph->state, FR_ATOMIC);

- call_rcu_hurry(&mm->futex.phash_rcu, futex_ref_rcu);
+ call_rcu_hurry(&mm->futex.phash.rcu, futex_ref_rcu);
}

static void __futex_ref_atomic_end(struct futex_private_hash *fph)
@@ -1637,7 +1641,7 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)
* Therefore the per-cpu counter is now stable, sum and reset.
*/
for_each_possible_cpu(cpu) {
- unsigned int *ptr = per_cpu_ptr(mm->futex.phash_ref, cpu);
+ unsigned int *ptr = per_cpu_ptr(mm->futex.phash.ref, cpu);
count += *ptr;
*ptr = 0;
}
@@ -1645,7 +1649,7 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)
/*
* Re-init for the next cycle.
*/
- this_cpu_inc(*mm->futex.phash_ref); /* 0 -> 1 */
+ this_cpu_inc(*mm->futex.phash.ref); /* 0 -> 1 */

/*
* Add actual count, subtract bias and initial refcount.
@@ -1653,7 +1657,7 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)
* The moment this atomic operation happens, futex_ref_is_dead() can
* become true.
*/
- ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex.phash_atomic);
+ ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex.phash.atomic);
if (!ret)
wake_up_var(mm);

@@ -1663,8 +1667,8 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)

static void futex_ref_rcu(struct rcu_head *head)
{
- struct mm_struct *mm = container_of(head, struct mm_struct, futex.phash_rcu);
- struct futex_private_hash *fph = rcu_dereference_raw(mm->futex.phash);
+ struct mm_struct *mm = container_of(head, struct mm_struct, futex.phash.rcu);
+ struct futex_private_hash *fph = rcu_dereference_raw(mm->futex.phash.hash);

if (fph->state == FR_PERCPU) {
/*
@@ -1693,7 +1697,7 @@ static void futex_ref_drop(struct futex_private_hash *fph)
/*
* Can only transition the current fph;
*/
- WARN_ON_ONCE(rcu_dereference_raw(mm->futex.phash) != fph);
+ WARN_ON_ONCE(rcu_dereference_raw(mm->futex.phash.hash) != fph);
/*
* We enqueue at least one RCU callback. Ensure mm stays if the task
* exits before the transition is completed.
@@ -1704,9 +1708,9 @@ static void futex_ref_drop(struct futex_private_hash *fph)
* In order to avoid the following scenario:
*
* futex_hash() __futex_pivot_hash()
- * guard(rcu); guard(mm->futex_hash_lock);
- * fph = mm->futex.phash;
- * rcu_assign_pointer(&mm->futex.phash, new);
+ * guard(rcu); guard(mm->futex.phash.lock);
+ * fph = mm->futex.phash.hash;
+ * rcu_assign_pointer(&mm->futex.phash.hash, new);
* futex_hash_allocate()
* futex_ref_drop()
* fph->state = FR_ATOMIC;
@@ -1721,7 +1725,7 @@ static void futex_ref_drop(struct futex_private_hash *fph)
* There must be at least one full grace-period between publishing a
* new fph and trying to replace it.
*/
- if (poll_state_synchronize_rcu(mm->futex.phash_batches)) {
+ if (poll_state_synchronize_rcu(mm->futex.phash.batches)) {
/*
* There was a grace-period, we can begin now.
*/
@@ -1729,7 +1733,7 @@ static void futex_ref_drop(struct futex_private_hash *fph)
return;
}

- call_rcu_hurry(&mm->futex.phash_rcu, futex_ref_rcu);
+ call_rcu_hurry(&mm->futex.phash.rcu, futex_ref_rcu);
}

static bool futex_ref_get(struct futex_private_hash *fph)
@@ -1739,11 +1743,11 @@ static bool futex_ref_get(struct futex_private_hash *fph)
guard(preempt)();

if (READ_ONCE(fph->state) == FR_PERCPU) {
- __this_cpu_inc(*mm->futex.phash_ref);
+ __this_cpu_inc(*mm->futex.phash.ref);
return true;
}

- return atomic_long_inc_not_zero(&mm->futex.phash_atomic);
+ return atomic_long_inc_not_zero(&mm->futex.phash.atomic);
}

static bool futex_ref_put(struct futex_private_hash *fph)
@@ -1753,11 +1757,11 @@ static bool futex_ref_put(struct futex_private_hash *fph)
guard(preempt)();

if (READ_ONCE(fph->state) == FR_PERCPU) {
- __this_cpu_dec(*mm->futex.phash_ref);
+ __this_cpu_dec(*mm->futex.phash.ref);
return false;
}

- return atomic_long_dec_and_test(&mm->futex.phash_atomic);
+ return atomic_long_dec_and_test(&mm->futex.phash.atomic);
}

static bool futex_ref_is_dead(struct futex_private_hash *fph)
@@ -1769,24 +1773,23 @@ static bool futex_ref_is_dead(struct futex_private_hash *fph)
if (smp_load_acquire(&fph->state) == FR_PERCPU)
return false;

- return atomic_long_read(&mm->futex.phash_atomic) == 0;
+ return atomic_long_read(&mm->futex.phash.atomic) == 0;
}

-int futex_mm_init(struct mm_struct *mm)
+static void futex_hash_init_mm(struct futex_mm_data *fd)
{
- memset(&mm->futex, 0, sizeof(mm->futex));
- mutex_init(&mm->futex.phash_lock);
- mm->futex.phash_batches = get_state_synchronize_rcu();
- return 0;
+ memset(&fd->phash, 0, sizeof(fd->phash));
+ mutex_init(&fd->phash.lock);
+ fd->phash.batches = get_state_synchronize_rcu();
}

void futex_hash_free(struct mm_struct *mm)
{
struct futex_private_hash *fph;

- free_percpu(mm->futex.phash_ref);
- kvfree(mm->futex.phash_new);
- fph = rcu_dereference_raw(mm->futex.phash);
+ free_percpu(mm->futex.phash.ref);
+ kvfree(mm->futex.phash.hash_new);
+ fph = rcu_dereference_raw(mm->futex.phash.hash);
if (fph)
kvfree(fph);
}
@@ -1797,10 +1800,10 @@ static bool futex_pivot_pending(struct mm_struct *mm)

guard(rcu)();

- if (!mm->futex.phash_new)
+ if (!mm->futex.phash.hash_new)
return true;

- fph = rcu_dereference(mm->futex.phash);
+ fph = rcu_dereference(mm->futex.phash.hash);
return futex_ref_is_dead(fph);
}

@@ -1842,7 +1845,7 @@ static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
* Once we've disabled the global hash there is no way back.
*/
scoped_guard(rcu) {
- fph = rcu_dereference(mm->futex.phash);
+ fph = rcu_dereference(mm->futex.phash.hash);
if (fph && !fph->hash_mask) {
if (custom)
return -EBUSY;
@@ -1850,15 +1853,15 @@ static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
}
}

- if (!mm->futex.phash_ref) {
+ if (!mm->futex.phash.ref) {
/*
* This will always be allocated by the first thread and
* therefore requires no locking.
*/
- mm->futex.phash_ref = alloc_percpu(unsigned int);
- if (!mm->futex.phash_ref)
+ mm->futex.phash.ref = alloc_percpu(unsigned int);
+ if (!mm->futex.phash.ref)
return -ENOMEM;
- this_cpu_inc(*mm->futex.phash_ref); /* 0 -> 1 */
+ this_cpu_inc(*mm->futex.phash.ref); /* 0 -> 1 */
}

fph = kvzalloc(struct_size(fph, queues, hash_slots),
@@ -1881,14 +1884,14 @@ static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
wait_var_event(mm, futex_pivot_pending(mm));
}

- scoped_guard(mutex, &mm->futex.phash_lock) {
+ scoped_guard(mutex, &mm->futex.phash.lock) {
struct futex_private_hash *free __free(kvfree) = NULL;
struct futex_private_hash *cur, *new;

- cur = rcu_dereference_protected(mm->futex.phash,
- lockdep_is_held(&mm->futex.phash_lock));
- new = mm->futex.phash_new;
- mm->futex.phash_new = NULL;
+ cur = rcu_dereference_protected(mm->futex.phash.hash,
+ lockdep_is_held(&mm->futex.phash.lock));
+ new = mm->futex.phash.hash_new;
+ mm->futex.phash.hash_new = NULL;

if (fph) {
if (cur && !cur->hash_mask) {
@@ -1898,7 +1901,7 @@ static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
* the second one returns here.
*/
free = fph;
- mm->futex.phash_new = new;
+ mm->futex.phash.hash_new = new;
return -EBUSY;
}
if (cur && !new) {
@@ -1947,11 +1950,9 @@ int futex_hash_allocate_default(void)
return 0;

scoped_guard(rcu) {
- threads = min_t(unsigned int,
- get_nr_threads(current),
- num_online_cpus());
+ threads = min_t(unsigned int, get_nr_threads(current), num_online_cpus());

- fph = rcu_dereference(current->mm->futex.phash);
+ fph = rcu_dereference(current->mm->futex.phash.hash);
if (fph) {
if (fph->custom)
return 0;
@@ -1978,25 +1979,51 @@ static int futex_hash_get_slots(void)
struct futex_private_hash *fph;

guard(rcu)();
- fph = rcu_dereference(current->mm->futex.phash);
+ fph = rcu_dereference(current->mm->futex.phash.hash);
if (fph && fph->hash_mask)
return fph->hash_mask + 1;
return 0;
}
+#else /* CONFIG_FUTEX_PRIVATE_HASH */
+static inline int futex_hash_allocate(unsigned int hslots, unsigned int flags) { return -EINVAL; }
+static inline int futex_hash_get_slots(void) { return 0; }
+static inline void futex_hash_init_mm(struct futex_mm_data *fd) { }
+#endif /* !CONFIG_FUTEX_PRIVATE_HASH */

-#else
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static void futex_invalidate_cs_ranges(struct futex_mm_data *fd)
+{
+ /*
+ * Invalidate start_ip so that the quick check fails for ip >= start_ip
+ * if VDSO is not mapped or the second slot is not available for compat
+ * tasks as they use VDSO32 which does not provide the 64-bit pointer
+ * variant.
+ */
+ for (int i = 0; i < FUTEX_ROBUST_MAX_CS_RANGES; i++)
+ fd->unlock.cs_ranges[i].start_ip = ~0UL;
+}

-static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
+void futex_reset_cs_ranges(struct futex_mm_data *fd)
{
- return -EINVAL;
+ memset(fd->unlock.cs_ranges, 0, sizeof(fd->unlock.cs_ranges));
+ futex_invalidate_cs_ranges(fd);
}

-static int futex_hash_get_slots(void)
+static void futex_robust_unlock_init_mm(struct futex_mm_data *fd)
{
- return 0;
+ /* mm_dup() preserves the range, mm_alloc() clears it */
+ if (!fd->unlock.cs_ranges[0].start_ip)
+ futex_invalidate_cs_ranges(fd);
}
+#else /* CONFIG_FUTEX_ROBUST_UNLOCK */
+static inline void futex_robust_unlock_init_mm(struct futex_mm_data *fd) { }
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */

-#endif
+void futex_mm_init(struct mm_struct *mm)
+{
+ futex_hash_init_mm(&mm->futex);
+ futex_robust_unlock_init_mm(&mm->futex);
+}

int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4)
{
diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
index e7d1254e18ca..62f21f8d89a6 100644
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -27,12 +27,14 @@
#include "futextest.h"
#include "../../kselftest_harness.h"

+#include <dlfcn.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <stddef.h>
+#include <sys/auxv.h>
#include <sys/mman.h>
#include <sys/wait.h>

@@ -54,6 +56,12 @@ static int get_robust_list(int pid, struct robust_list_head **head, size_t *len_
return syscall(SYS_get_robust_list, pid, head, len_ptr);
}

+static int sys_futex_robust_unlock(_Atomic(uint32_t) *uaddr, unsigned int op, int val,
+ void *list_op_pending, unsigned int val3)
+{
+ return syscall(SYS_futex, uaddr, op, val, NULL, list_op_pending, val3, 0);
+}
+
/*
* Basic lock struct, contains just the futex word and the robust list element
* Real implementations have also a *prev to easily walk in the list
@@ -549,4 +557,199 @@ TEST(test_circular_list)
ksft_test_result_pass("%s\n", __func__);
}

+/*
+ * Below are tests for the fix of robust release race condition. Please read the following
+ * thread to learn more about the issue in the first place and why the following functions fix it:
+ * https://lore.kernel.org/lkml/20260316162316.356674433@xxxxxxxxxx/
+ */
+
+/*
+ * Auxiliary code for loading the vDSO functions
+ */
+#define VDSO_SIZE 0x4000
+
+void *get_vdso_func_addr(const char *str)
+{
+ void *vdso_base = (void *) getauxval(AT_SYSINFO_EHDR), *addr;
+ Dl_info info;
+
+ if (!vdso_base) {
+ perror("Error to get AT_SYSINFO_EHDR");
+ return NULL;
+ }
+
+ for (addr = vdso_base; addr < vdso_base + VDSO_SIZE; addr += sizeof(addr)) {
+ if (dladdr(addr, &info) == 0 || !info.dli_sname)
+ continue;
+
+ if (!strcmp(info.dli_sname, str))
+ return info.dli_saddr;
+ }
+
+ return NULL;
+}
+
+/*
+ * These are the real vDSO function signatures:
+ *
+ * __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+ * __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+ *
+ * So for the generic entry point we need to use a void pointer as the last argument
+ */
+FIXTURE(vdso_unlock)
+{
+ uint32_t (*vdso)(_Atomic(uint32_t) *lock, uint32_t tid, void *pop);
+};
+
+FIXTURE_VARIANT(vdso_unlock)
+{
+ bool is_32;
+ char func_name[];
+};
+
+FIXTURE_SETUP(vdso_unlock)
+{
+ self->vdso = get_vdso_func_addr(variant->func_name);
+
+ if (!self->vdso)
+ ksft_test_result_skip("%s not found\n", variant->func_name);
+}
+
+FIXTURE_TEARDOWN(vdso_unlock) {}
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 32)
+{
+ .func_name = "__vdso_futex_robust_list32_try_unlock",
+ .is_32 = true,
+};
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 64)
+{
+ .func_name = "__vdso_futex_robust_list64_try_unlock",
+ .is_32 = false,
+};
+
+/*
+ * Test the vDSO robust_listXX_try_unlock() for the uncontended case. The virtual syscall should
+ * return the thread ID of the lock owner, the lock word must be 0 and the list_op_pending should
+ * be NULL.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_uncontended)
+{
+ struct lock_struct lock = { .futex = 0 };
+ _Atomic(unsigned int) *futex = &lock.futex;
+ struct robust_list_head head;
+ uint64_t exp = (uint64_t) NULL;
+ pid_t tid = gettid();
+ int ret;
+
+ *futex = tid;
+
+ ret = set_list(&head);
+ if (ret)
+ ksft_test_result_fail("set_robust_list error\n");
+
+ head.list_op_pending = &lock.list;
+
+ ret = self->vdso(futex, tid, &head.list_op_pending);
+
+ ASSERT_EQ(ret, tid);
+ ASSERT_EQ(*futex, 0);
+
+ /* Check only the lower 32 bits for the 32-bit entry point */
+ if (variant->is_32) {
+ exp = (uint64_t)(unsigned long)&lock.list;
+ exp &= ~0xFFFFFFFFULL;
+ }
+
+ ASSERT_EQ((uint64_t)(unsigned long)head.list_op_pending, exp);
+}
+
+/*
+ * If the lock is contended, the operation fails. The return value is the value found at the
+ * futex word (tid | FUTEX_WAITERS), the futex word is not modified and the list_op_pending is_32
+ * not cleared.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_contended)
+{
+ struct lock_struct lock = { .futex = 0 };
+ _Atomic(unsigned int) *futex = &lock.futex;
+ struct robust_list_head head;
+ pid_t tid = gettid();
+ int ret;
+
+ *futex = tid | FUTEX_WAITERS;
+
+ ret = set_list(&head);
+ if (ret)
+ ksft_test_result_fail("set_robust_list error\n");
+
+ head.list_op_pending = &lock.list;
+
+ ret = self->vdso(futex, tid, &head.list_op_pending);
+
+ ASSERT_EQ(ret, tid | FUTEX_WAITERS);
+ ASSERT_EQ(*futex, tid | FUTEX_WAITERS);
+ ASSERT_EQ(head.list_op_pending, &lock.list);
+}
+
+FIXTURE(futex_op) {};
+
+FIXTURE_VARIANT(futex_op)
+{
+ unsigned int op;
+ unsigned int val3;
+};
+
+FIXTURE_SETUP(futex_op) {}
+
+FIXTURE_TEARDOWN(futex_op) {}
+
+FIXTURE_VARIANT_ADD(futex_op, wake)
+{
+ .op = FUTEX_WAKE,
+ .val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake_bitset)
+{
+ .op = FUTEX_WAKE_BITSET,
+ .val3 = FUTEX_BITSET_MATCH_ANY,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, unlock_pi)
+{
+ .op = FUTEX_UNLOCK_PI,
+ .val3 = 0,
+};
+
+/*
+ * The syscall should return the number of tasks waken (for this test, 0), clear the futex word and
+ * clear list_op_pending
+ */
+TEST_F(futex_op, test_futex_robust_unlock)
+{
+ struct lock_struct lock = { .futex = 0 };
+ _Atomic(unsigned int) *futex = &lock.futex;
+ struct robust_list_head head;
+ pid_t tid = gettid();
+ int ret;
+
+ *futex = tid | FUTEX_WAITERS;
+
+ ret = set_list(&head);
+ if (ret)
+ ksft_test_result_fail("set_robust_list error\n");
+
+ head.list_op_pending = &lock.list;
+
+ ret = sys_futex_robust_unlock(futex, FUTEX_ROBUST_UNLOCK | variant->op, tid,
+ &head.list_op_pending, variant->val3);
+
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(*futex, 0);
+ ASSERT_EQ(head.list_op_pending, NULL);
+}
+
TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h
index 3d48e9789d9f..f4d880b8e795 100644
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -38,6 +38,9 @@ typedef volatile u_int32_t futex_t;
#ifndef FUTEX_CMP_REQUEUE_PI
#define FUTEX_CMP_REQUEUE_PI 12
#endif
+#ifndef FUTEX_ROBUST_UNLOCK
+#define FUTEX_ROBUST_UNLOCK 512
+#endif
#ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)