Re: [RFC PATCH 0/2] futex: how to solve the robust_list race condition?

From: Mathieu Desnoyers

Date: Mon Feb 23 2026 - 08:38:07 EST


On 2026-02-23 06:13, Florian Weimer wrote:
* Mathieu Desnoyers:

Trying to find a backward compatible way to solve this may be tricky.
Here is one possible approach I have in mind: Introduce a new syscall,
e.g. sys_cleanup_robust_list(void *addr)

This system call would be invoked on pthread_mutex_destroy(3) of
robust mutexes, and do the following:

- Calculate the offset of @addr within its mapping,
- Iterate on all processes which map the backing store which contain
the lock address @addr.
- Iterate on each thread sibling within each of those processes,
- If the thread has a robust list, and its list_op_pending points
to the same offset within the backing store mapping, clear the
list_op_pending pointer.

The overhead would be added specifically to pthread_mutex_destroy(3),
and only for robust mutexes.

Would we have to do this for pthread_mutex_destroy only, or also for
pthread_join? It is defined to exit a thread with mutexes still locked,
and the pthread_join call could mean that the application can determine
by its own logic that the backing store can be deallocated.
Let me try to wrap my head around this scenario.

AFAIU, the https://man7.org/linux/man-pages/man3/pthread_join.3.html
NOTES section states the following for pthread_join(3):

After a successful call to pthread_join(), the caller is
guaranteed that the target thread has terminated. The caller may
then choose to do any clean-up that is required after termination
of the thread (e.g., freeing memory or other resources that were
allocated to the target thread).

What is the behavior when a thread exits with a mutex locked ? I would
expect that this mutex stays locked and the pthread_join(3) caller gets
to release that mutex and eventually calls pthread_mutex_destroy(3) if
the application logic allows it.

But it looks like you are implying that the pthread_mutex_destroy(3) is
somehow implicit to pthread_join, and I really don't understand that
part. Am I missing something ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com