Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
From: Mathieu Desnoyers
Date: Mon Mar 16 2026 - 15:37:03 EST
On 2026-03-16 13:12, Thomas Gleixner wrote:
On Thu, Mar 12 2026 at 18:52, Mathieu Desnoyers wrote:[...]
To fix this for correctness sake it needs more than a hack in the kernel
without even looking at the overall larger picture.
If my POC helped move the discussion forward, then it has achieved
its purpose. :)
I sat down and did a
full analysis and here are the most important questions:
Q: Have non-PI and PI to be treated differently?
A: No.
That's just historical evolution. While PI can't use XCHG because that
would create inconsistent state, there is absolutely no reason why
non-PI can't use try_cmpxchg().
Agreed.
Q: Is it required to unlock in user space first and then go into the kernel
to wake up waiters?
A: No.
That's again a historical leftover from the 1st generation futexes which
preceeded both robust and PI. There is no technical reason to keep it
this way.
So both can do:
if (cmpxchg(lock, tid, 0) != tid)
sys_futex(UNLOCK,....);
which then allows for both non-PI and PI to hand the pending op pointer
into the syscall and let the kernel deal with the unlock, the op pointer
and the wake up in one go.
Yes, that's a nice simplification.
That reduces the problem space to take care of the non-contended unlock
case, where the pending op is cleared after the cmpxchg() succeeded.
And yes, that part can be done in the VDSO and a fixup mechanism in the
kernel.
Yes.
Q: Are robust list pointers guaranteed to be 64-bit when running as a
64-bit task?
A: No.
The gaming emulators use both the native 64-bit robust list and the
32-bit robust list from the same 64-bit application to make the
emulation work.
So both the UNLOCK syscall and the fixup need to have means to figure
out the to be cleared size for that pointer.
Sure, this can be done with a boat load of different functions and flags
and whatever, but that makes the actual fixup handling in the kernel
more complicated than necessary.
Good point, this is a requirement I did not know about. I notice you
are dealing with it in your series.
Q: Have regular signal delivery and process exit in case of crash or being
killed by a external signal to be treated differently?
A: No.
A task always goes through the same signal code path for both cases so
all of this can be handled in _one_ place without even touching the
robust list cleanup code.
So far, yes.
sys_exit() is different because there a task voluntarily exits and if
it does so between the unlock and the clearing of the op pointer,
then so be it. That'd be wilfull ignorance or malice and not any
different from the task doing the corruption itself in user space
right away.
I'm not sure about this one. How about the two following scenario:
A concurrent thread calls sys_exit concurrently with the vdso. Is this
something we should handle or consider it "wilfull ignorance/malice" ?
Q: Are exception tables a good idea?
A: No.
This is not an exception handling case. It's a fixup similar to RSEQ
critical section fixups and so it has to be handled with dedicated
mechanisms which are performant and not glued onto something which has a
completely different purpose.
I agree with your kernel-level approach. I've proposed a few changes to
the vdso itself and vdso2c script to increase robustness in my review.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com