Re: Prevent inconsistent CPU state after sequence of dlclose/dlopen

From: Paul E. McKenney
Date: Fri Jan 10 2025 - 13:33:41 EST


On Fri, Jan 10, 2025 at 12:13:58PM -0500, Mathieu Desnoyers wrote:
> On 2025-01-10 12:04, Florian Weimer wrote:
> > * Mathieu Desnoyers:
> >
> > > I was discussing with Mark Rutland recently, and he pointed out that a
> > > sequence of dlclose/dlopen mapping new code at the same addresses in
> > > multithreaded environments is an issue on ARM, and possibly on Intel/AMD
> > > with the newer TLB broadcast maintenance.
> > >
> > > I maintain the membarrier(2) system call, which provides a
> > > MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE command for this
> > > purpose. It's been there since Linux 4.16. It can be configured
> > > out (CONFIG_MEMBARRIER=n), but it's enabled by default.
> > >
> > > Calling this after dlclose() in glibc would prevent this issue.
> > >
> > > Is it handled in some other way, or should we open a bugzilla
> > > entry to track this ?
> >
> > There is nothing special about dlopen/dlclose, we just use mmap/munmap.
> > If there is a synchronization problem, we'd have to add to add barriers
> > to mmap and munmap.
> >
> > But why isn't it up to the kernel to handle this correctly?
>
> As I mentioned to Peter, we could add this barrier within mprotect(2)
> and munmap(2) in the following cases:
>
> - mprotect removes PROT_EXEC from a mapping,
> - munmap unmaps a PROT_EXEC mapping.
>
> We could even go further and batch this: we only need to
> issue membarrier-sync-core on the following sequence for an mm:
>
> On either of those, set current->mm->pending_membarrier_sync_core = true:
> - mprotect removes PROT_EXEC from a mapping, or
> - munmap unmaps a PROT_EXEC mapping,
>
> And then, if current->mm->pending_membarrier_sync_core == true when:
> - mmap is called to create a PROT_EXEC mapping, or
> - mprotect sets PROT_EXEC on a mapping.
>
> invoke membarrier sync-core and set
> current->mm_pending_membarrier = false
>
> Thoughts ?

In the case where the kernel is permitted to choose the address of the
new mapping, we could batch the sys_membarrier() calls, keeping the
corresponding address space out of service in the meantime.

Don't tell me, a given dynamic library always picks the same address? :-/

Thanx, Paul