Re: Prevent inconsistent CPU state after sequence of dlclose/dlopen

From: Florian Weimer
Date: Fri Jan 10 2025 - 12:11:20 EST


* Mathieu Desnoyers:

> On 2025-01-10 11:54, Peter Zijlstra wrote:
>> On Fri, Jan 10, 2025 at 10:55:36AM -0500, Mathieu Desnoyers wrote:
>>> Hi,
>>>
>>> I was discussing with Mark Rutland recently, and he pointed out that a
>>> sequence of dlclose/dlopen mapping new code at the same addresses in
>>> multithreaded environments is an issue on ARM, and possibly on Intel/AMD
>>> with the newer TLB broadcast maintenance.
>> What is the exact race? Should not munmap() invalidate the TLBs
>> before
>> it allows overlapping mmap() to complete?
>
> The race Mark mentioned (on ARM) is AFAIU the following scenario:
>
> CPU 0 CPU 1
>
> - dlopen()
> - mmap PROT_EXEC @addr
> - fetch insn @addr, CPU state expects unchanged insn.
> - execute unrelated code
> - dlclose(addr)
> - munmap @addr
> - dlopen()
> - mmap PROT_EXEC @addr
> - fetch new insn @addr. Incoherent CPU state.

Unmapping an object while code is executing in it is undefined.

We have a problem with things like pthread_atfork handlers. We can't
use locking there because fork handlers are expected to perform ample
locking themselves, and an extra lock around them would run into lock
ordering issues. (We tried for unrelated reasons and saw deadlocks in
applications.)

What we can do is bump a reference counter while we run a pthread_atfork
callback (we already associate them with DSOs) and skip the munmap part
in dlclose if the counter is not zero. We can complete the unmapping
after the fork handler returns (maybe in the parent only).

There might be other callbacks besides fork handlers that have this
problem. A similar treatment is possible for some of them, hopefully
all of them in glibc. We cannot cover things like std::shared_ptr
destructor calls, though. But adding more barriers won't fix those,
either.

Thanks,
Florian