Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance

From: Liam R. Howlett

Date: Mon Jun 22 2026 - 10:54:45 EST

On 26/06/22 08:15AM, Barry Song wrote:
> On Mon, Jun 22, 2026 at 4:49 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Sat, Jun 20, 2026 at 04:48:57PM -0700, Suren Baghdasaryan wrote:
> > > Just checking in on the followup plans. IIUC the RFC mentioned will
> > > try to implement the solution we discussed at LSFMM: splitting
> > > VM_FAULT_RETRY into two flags - one for retrying under per-VMA locks
> > > and another one to fallback to mmap_lock.
> >
> > I continue to hate this idea. I don't believe that those who were
> > pushing for it have ever tried to understand the whole fault path.
> > It's utterly byzantine.
> >
> > I defy anyone to make sense of this:
> >
> > /*
> > * NOTE! This will make us return with VM_FAULT_RETRY, but with
> > * the fault lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
> > * is supposed to work. We have way too many special cases..
> > */
> > if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> > return 0;
> >
> > *fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
> > if (vmf->flags & FAULT_FLAG_KILLABLE) {
> > if (__folio_lock_killable(folio)) {
> > /*
> > * We didn't have the right flags to drop the
> > * fault lock, but all fault_handlers only check
> > * for fatal signals if we return VM_FAULT_RETRY,
> > * so we need to drop the fault lock here and
> > * return 0 if we don't have a fpin.
> > */
> > if (*fpin == NULL)
> > release_fault_lock(vmf);
> > return 0;
> > }
> >
> > Wed need to simplify the fault path, not add additional complexity.
> > Josef has said he wouldn't've done the lock dropping had we had per-VMA
> > locks. We should rip it out.
>
> I think you have agreed that, at least for anon vma, we can
> keep the current policy, since anon vma is much more volatile
> than file vma.

I don't think any of the above has to do with anon vmas. Does any anon
vma handling have anything to do with your problem?

This would be needed if anon vmas were being faulted while being
unmapped or merged? Do we really need a fast path for that? Note that
anon vmas cannot be merged if the vma chain... you know what, I wonder
how many people know what I'm talking about here... Let's just say that
they can't be merged if they were around for a fork.

So, then, we're looking at anon vmas taking the mmap lock on:
1. single task anon vmas being expanded and faulted at the same time
2. single task anon vmas being unmapped and faulted at the same time

I think that's it?

But maybe I missed something critical about your use case here?

I don't understand why you are involving anon vmas in this discussion,
so I must have missed something with your IO completion issue. Is there
an anon vma causing your priority inversion?

> Concurrent page faults and VMA modifications can happen more
> often than with file VMAs.

But it's only a problem for anon vmas with per-vma locking if it's the
same vma (or the vma lock sequence counter overflows, but let's say
that's a statistically insignificant non-zero value).

>
> For file vmas, how much code can we actually remove, given that
> the first page fault might already be holding mmap_lock?

How much complexity can we remove and maintain the performance, might be
a better question.

> It could be the case that lock_vma_under_rcu() fails, and then
> on the first page fault we end up holding mmap_lock before
> retrying. So are we also going to rip out the lock release,
> even if it risks holding mmap_lock for a long time?
>
> vma = lock_vma_under_rcu(mm, addr);
> if (!vma)
> goto lock_mmap;
> ...
> lock_mmap:
>
> vma = lock_mm_and_find_vma(mm, addr, regs);
> if (unlikely(!vma)) {
> fault = 0;
> si_code = SEGV_MAPERR;
> goto bad_area;
> }
>
> If we still need to keep the page fault retry code there, it
> doesn't seem like "ripping out" really reduces complexity in
> the page fault code?

This seems unrelated to be above complexity that might be the target of
removal?

Thanks,
Liam