Re: [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path

From: Suren Baghdasaryan

Date: Wed May 13 2026 - 22:04:00 EST

On Wed, May 13, 2026 at 6:44 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Tue, May 5, 2026 at 9:39 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> >
> > On 5/4/26 16:15, Edgecombe, Rick P wrote:
> > >
> > > I guess the problem is the lock ordering. Not sure if there is any slow path
> > > avoidance details that could make this splat a false positive. But how about
> > > this simpler munmap() case:
> > >
> > > Shadow stack signal munmap()
> > > ------------------- --------
> > > vma_start_read() (VM_SHADOW_STACK check)
> > > mmap_write_lock()
> > > mmap_read_lock() (user fault) <- deadlock
> > > vma_start_write() <-deadlock
> >
> > It's a little more complicated than that in practice, but I think you're
> > right.
> >
> > I'm not sure when this would happen in practice because the fault is
> > actually on the VMA that's being held for read. So I think another
> > writer would have had to sneak in there and zap the VMA.
>
> Not necessarily to zap the VMA but to modify it, which requires
> write-locking it (split, merge, remap, etc). Note that write-locking a
> VMA always involves mmap_write_lock() followed by vma_start_write(),
> IOW we end up write-locking both mmap and per-VMA locks in that order.
>
> The lock ordering when we take both mmap and per-VMA locks is to
> always take mmap_lock first, followed by the per-VMA lock. With this
> change we read-lock the VMA first and then call
> get_shstk_data()->get_user()->might_fault()->might_lock_read(mm->mmap_lock).
> So, we take per-VMA read-lock and then mmap_read_lock and that's the
> problem.
>
> >
> > The funny thing is that the fault handler is really just trying to find
> > the VMA. The thing causing the fault *has* the VMA. So it's as simple as
> > just passing the VMA down into the fault handler, right? How hard could
> > it be? ;)
> >
> > There are still games to play, but they all involve dropping locks and
> > retrying, like:
> >
> > retry:
> > vma = lock_vma_under_rcu()
> > // muck with VMA
> > pagefault_disable() // avoid deadlock
> > ret = copy_from_user()
> > pagefault_enable()
> > vma_end_read();
> >
> > if (!ret)
> > return SUCCESS;
> >
> > mmap_read_lock()
> > vma = vma_lookup()
> > mmap_read_unlock() // avoid deadlock before touching userspace
> > // check for valid VMA to avoid looping when there is no VMA
> > if (!vma)
> > return -ERRNO;
> >
> > // uh oh, slow path, something faulted
> > get_user_pages()??
> > //or
> > copy_from_user() without the VMA??
> >
> > goto retry;
> >
> >
> > This also needs some very careful thought, but something like this
> > should work, where we avoid fault handling (and lock taking) in the
> > actual #PF and do it in a context where the VMA lock is held:
> >
> > vma = lock_vma_under_rcu();
> > pagefault_disable() // avoid deadlock
> >
> > while (1) {
> > ret = copy_from_user()
> > if (!ret)
> > break;
> > handle_mm_fault(vma, address, FAULT_FLAG_VMA_LOCK...);
> > };
> >
> > pagefault_enable()
> > vma_end_read();
> >
> > That's effectively just short-circuiting the #PF code which does the:
> >
> > vma = lock_vma_under_rcu(mm, address);
> > ...
> > fault = handle_mm_fault(vma, address, ... FAULT_FLAG_VMA_LOCK)
> >
> > sequence _itself_.
>
> IMHO that looks more complex than the original speculation+retry solution :)

Actually, after looking closer at the original code, I think we can do
this without taking the mmap_lock at all. Basically, we can have the
same speculation+retry loop without mmap_read_lock_killable().
mmap_lock_speculate_try_begin() does not require to be called with
mmap_lock held. So, we can find the VMA using lock_vma_under_rcu(), do
the valid_vma checks, call mmap_lock_speculate_try_begin() then drop
the per-VMA lock with vma_end_read() and proceed with
get_shstk_data(). I think that should work and would avoid taking
mmap_lock completely.