Re: Prevent inconsistent CPU state after sequence of dlclose/dlopen

From: Adhemerval Zanella Netto
Date: Fri Jan 10 2025 - 12:14:52 EST




On 10/01/25 14:10, Florian Weimer wrote:
> * Mathieu Desnoyers:
>
>> On 2025-01-10 11:54, Peter Zijlstra wrote:
>>> On Fri, Jan 10, 2025 at 10:55:36AM -0500, Mathieu Desnoyers wrote:
>>>> Hi,
>>>>
>>>> I was discussing with Mark Rutland recently, and he pointed out that a
>>>> sequence of dlclose/dlopen mapping new code at the same addresses in
>>>> multithreaded environments is an issue on ARM, and possibly on Intel/AMD
>>>> with the newer TLB broadcast maintenance.
>>> What is the exact race? Should not munmap() invalidate the TLBs
>>> before
>>> it allows overlapping mmap() to complete?
>>
>> The race Mark mentioned (on ARM) is AFAIU the following scenario:
>>
>> CPU 0 CPU 1
>>
>> - dlopen()
>> - mmap PROT_EXEC @addr
>> - fetch insn @addr, CPU state expects unchanged insn.
>> - execute unrelated code
>> - dlclose(addr)
>> - munmap @addr
>> - dlopen()
>> - mmap PROT_EXEC @addr
>> - fetch new insn @addr. Incoherent CPU state.
>
> Unmapping an object while code is executing in it is undefined.
>
> We have a problem with things like pthread_atfork handlers. We can't
> use locking there because fork handlers are expected to perform ample
> locking themselves, and an extra lock around them would run into lock
> ordering issues. (We tried for unrelated reasons and saw deadlocks in
> applications.)
>
> What we can do is bump a reference counter while we run a pthread_atfork
> callback (we already associate them with DSOs) and skip the munmap part
> in dlclose if the counter is not zero. We can complete the unmapping
> after the fork handler returns (maybe in the parent only).

We can also make dlclose a no-op (like some runtimes do), although this
has other implications.

>
> There might be other callbacks besides fork handlers that have this
> problem. A similar treatment is possible for some of them, hopefully
> all of them in glibc. We cannot cover things like std::shared_ptr
> destructor calls, though. But adding more barriers won't fix those,
> either.
>
> Thanks,
> Florian
>