Re: [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path

From: Suren Baghdasaryan

Date: Wed May 13 2026 - 21:49:21 EST

On Fri, May 8, 2026 at 1:39 PM Lorenzo Stoakes <ljs@xxxxxxxxxx> wrote:
>
> On Tue, May 05, 2026 at 09:39:09AM -0700, Dave Hansen wrote:
> > On 5/4/26 16:15, Edgecombe, Rick P wrote:
> > >
> > > I guess the problem is the lock ordering. Not sure if there is any slow path
> > > avoidance details that could make this splat a false positive. But how about
> > > this simpler munmap() case:
> > >
> > > Shadow stack signal munmap()
> > > ------------------- --------
> > > vma_start_read() (VM_SHADOW_STACK check)
> > > mmap_write_lock()
> > > mmap_read_lock() (user fault) <- deadlock
> > > vma_start_write() <-deadlock
> >
> > It's a little more complicated than that in practice, but I think you're
> > right.
> >
> > I'm not sure when this would happen in practice because the fault is
> > actually on the VMA that's being held for read. So I think another
> > writer would have had to sneak in there and zap the VMA.
>
> Honestly I think any work around is just going to be more complicated than the
> existing code, which sort of defeats the purpose of the series.
>
> There's not really a way to speculate with a VMA seqnum because you'd have to be
> able to observe its vma->vm_lock_seq and to do that you'd have to find it again
> immediately afterwards and then you'd looked up twice, got the lock twice only
> to confirm its the same damn thing :)
>
> The issue is that a page fault on the same thread is always going to risk an
> mmap read lock being taken (possibly due to I/O waiting and fault retry for
> one). And faults/zaps are inherently racey and neither acquire the write lock so
> the read lock doesn't preclude them.
>
> And you can't realy disable page faults because you're potentially relying on
> them to populate what you're touching...
>
> Also there's some tricky stuff done on initial stack initialisation that can
> cause a headache as well (when stack is set up), see relocate_vma_down() to make
> life more painful.
>
> So I think the existing code is simpler.

+1

>
> It doesn't mean it isn't still useful to move towards having VMA locks
> everywhere though :) unless Suren or others can find a flaw with that...

Nope, I'm all in for the rest of the series.

>
> >
> > The funny thing is that the fault handler is really just trying to find
> > the VMA. The thing causing the fault *has* the VMA. So it's as simple as
> > just passing the VMA down into the fault handler, right? How hard could
> > it be? ;)
> >
> > There are still games to play, but they all involve dropping locks and
> > retrying, like:
> >
> > retry:
> > vma = lock_vma_under_rcu()
> > // muck with VMA
> > pagefault_disable() // avoid deadlock
> > ret = copy_from_user()
> > pagefault_enable()
> > vma_end_read();
> >
> > if (!ret)
> > return SUCCESS;
> >
> > mmap_read_lock()
> > vma = vma_lookup()
> > mmap_read_unlock() // avoid deadlock before touching userspace
> > // check for valid VMA to avoid looping when there is no VMA
> > if (!vma)
> > return -ERRNO;
> >
> > // uh oh, slow path, something faulted
> > get_user_pages()??
> > //or
> > copy_from_user() without the VMA??
> >
> > goto retry;
> >
> >
> > This also needs some very careful thought, but something like this
> > should work, where we avoid fault handling (and lock taking) in the
> > actual #PF and do it in a context where the VMA lock is held:
> >
> > vma = lock_vma_under_rcu();
> > pagefault_disable() // avoid deadlock
> >
> > while (1) {
> > ret = copy_from_user()
> > if (!ret)
> > break;
> > handle_mm_fault(vma, address, FAULT_FLAG_VMA_LOCK...);
> > };
> >
> > pagefault_enable()
> > vma_end_read();
> >
> > That's effectively just short-circuiting the #PF code which does the:
> >
> > vma = lock_vma_under_rcu(mm, address);
> > ...
> > fault = handle_mm_fault(vma, address, ... FAULT_FLAG_VMA_LOCK)
> >
> > sequence _itself_.