Re: [RFC PATCH 0/2] futex: how to solve the robust_list race condition?

From: Liam R. Howlett

Date: Fri Feb 20 2026 - 15:53:18 EST


+Cc Suren, Lorenzo, and Michal

* André Almeida <andrealmeid@xxxxxxxxxx> [260220 15:27]:
> During LPC 2025, I presented a session about creating a new syscall for
> robust_list[0][1]. However, most of the session discussion wasn't much related
> to the new syscall itself, but much more related to an old bug that exists in
> the current robust_list mechanism.

Ah, sorry for hijacking the session, that was not my intention, but this
needs to be addressed before we propagate the issue into the next
iteration.

>
> Since at least 2012, there's an open bug reporting a race condition, as
> Carlos O'Donell pointed out:
>
> "File corruption race condition in robust mutex unlocking"
> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>
> To help understand the bug, I've created a reproducer (patch 1/2) and a
> companion kernel hack (patch 2/2) that helps to make the race condition
> more likely. When the bug happens, the reproducer shows a message
> comparing the original memory with the corrupted one:
>
> "Memory was corrupted by the kernel: 8001fe8d8001fe8d vs 8001fe8dc0000000"
>
> I'm not sure yet what would be the appropriated approach to fix it, so I
> decided to reach the community before moving forward in some direction.
> One suggestion from Peter[2] resolves around serializing the mmap() and the
> robust list exit path, which might cause overheads for the common case,
> where list_op_pending is empty.
>
> However, giving that there's a new interface being prepared, this could
> also give the opportunity to rethink how list_op_pending works, and get
> rid of the race condition by design.
>
> Feedback is very much welcome.

There was a delay added to the oom reaper for these tasks [1] by commit
e4a38402c36e ("oom_kill.c: futex: delay the OOM reaper to allow time for
proper futex cleanup")

We did discuss marking the vmas as needing to be skipped by the oom
manager, but no clear path forward was clear. It's also not clear if
that's the only area where such a problem exists.

[1]. https://lore.kernel.org/all/20220414144042.677008-1-npache@xxxxxxxxxx/T/#u

>
> Thanks!
> André
>
> [0] https://lore.kernel.org/lkml/20251122-tonyk-robust_futex-v6-0-05fea005a0fd@xxxxxxxxxx/
> [1] https://lpc.events/event/19/contributions/2108/
> [2] https://lore.kernel.org/lkml/20241219171344.GA26279@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> André Almeida (2):
> futex: Create reproducer for robust_list race condition
> futex: Add debug delays
>
> kernel/futex/core.c | 10 +++
> robust_bug.c | 178 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 188 insertions(+)
> create mode 100644 robust_bug.c
>
> --
> 2.53.0
>