Re: [RFC PATCH 0/2] futex: how to solve the robust_list race condition?

From: Florian Weimer

Date: Mon Mar 02 2026 - 12:02:06 EST

* Mathieu Desnoyers:

> On 2026-03-02 10:32, Florian Weimer wrote:
>> * Mathieu Desnoyers:
>>
>>> On 2026-03-02 02:31, Florian Weimer wrote:
>>>> * Mathieu Desnoyers:
>>>>
>>>>> Of course, we'd have to implement the whole transaction in assembler
>>>>> for each architecture.
>>>> Could this be hidden ina vDSO call?
>>>
> [...]
>>> I suspect the IP ranges and associated store-conditional flags I identified
>>> for the rseq_rl_cs approach are pretty much the key states we need to track.
>>> Architectures which support atomic exchange instructions are even simpler.
>>> We'd just have to keep track of this unlock operations steps internally
>>> between the kernel and the vDSO.
>> If the unlock operation is in the vDSO, we need to parameterize it
>> somehow, regarding offsets, values written etc., so that it's not
>> specific to exactly one robust mutex implementation.
>
> Agreed.
>
>>
>>> But you mentioned that rseq would be needed for a flag, so what I am
>>> missing ?
>> It's so that you don't have to figure out that the program counter
>> is
>> somewhere in the special robust mutex unlock code every time a task gets
>> descheduled.
>
> AFAIU we don't need to evaluate this on context switch. We only need
> to evaluate it at:
>
> (a) Signal delivery,
> (b) Process exit.

Ah, missed that part. It changes the rules somewhat.

> Also, the tradeoff here is not clear cut to me: the only thing the rseq
> flag would prevent is comparisons of the instruction pointer against a
> vDSO range at (a) and (b), which are not as performance critical as
> context switches. I'm not sure it would warrant the added complexity of
> the rseq flag, and coupling with rseq. Moreover, I'm not convinced that
> loading an extra rseq flag field from userspace would be faster than
> just comparing with a known range of vDSO addresses.

It wouldn't work for the signal case anyway. That would need space in
rseq for some kind of write-ahead log of the operation before it's being
carried out, so that it can be completed on signal delivery/process
exit.

Thanks,
Florian