Re: [PATCH] mm: Always sanity check anon_vma first for per-vma locks

From: Peter Xu
Date: Thu Apr 11 2024 - 17:12:29 EST


Hey, Liam,

Thanks a lot for the comments.

On Thu, Apr 11, 2024 at 01:13:19PM -0400, Liam R. Howlett wrote:
> * Peter Xu <peterx@xxxxxxxxxx> [240410 13:06]:
> > anon_vma is a tricky object in the context of per-vma lock, because it's
> > racy to modify it in that context and mmap lock is needed if it's not
> > stable yet.
>
> How is it racy in your view? There isn't a way to get in a situation
> where its state isn't certain, is there?

My apologies on these confusing wordings. AFAICT we're on the same page
all over the place on this matter, so I will amend these wordings if I'll
repost, and skip many of below comments if I'm confident we're aligned on
the understandings.

And actually, I'll already attach one new version that should remove all
bad side effects, but hopefully only bring good ones, see below.

>
> >
> > So far there are three places that sanity checks anon_vma for that:
> >
> > - lock_vma_under_rcu(): this is the major entrance of per-vma lock, where
> > we have taken care of anon memory v.s. potential anon_vma allocations.
>
> Well, not exactly. Single vma faults vs potential modifications beyond
> one vma (including uses of adjacent vmas with anon_vmas)

(this is also the wording issue; I'll fix it)

>
> >
> > - lock_vma(): even if it looks so generic as an API, it's only used in
> > userfaultfd context to leverage per-vma locks. It does extra check
> > over MAP_PRIVATE file mappings for the same anon_vma issue.
>
> This static function is in mm/userfaultfd so, yes, it's not generic -
> the name is generic, but I didn't see a reason to hold up the patch for
> a static name that is descriptive. It's static so I'm not concerned
> about usage growing.
>
> I would expect such a check will eventually need to be moved to common
> code, and then potentially change the name to something more
> descriptive. This seems premature with a single user though.
>
> >
> > - vmf_anon_prepare(): it works for private file mapping faults just like
> > what lock_vma() wanted to cover above. One trivial difference is in
> > some extremely corner case, the fault handler will still allow per-vma
> > fault to happen, like a READ on a privately mapped file.
>
> What is happening here is that we are detecting when the virtual memory
> space is stable vs when the vma is stable. In some cases, when we need
> to check prev/next, then we need to make sure the virtual memory space
> is stable. In other cases, the vma is all that matters to the operation
> and so we can continue without stopping anyone else from modifying the
> virtual memory space - that is, we are allowing writes in other areas.
>
> >
> > The question is whether that's intended to make it as complicated. Per my
> > question in the thread, it is not intended, and Suren also seems to agree [1].
> >
> > So the trivial side effect of such patch is:
> >
> > - We may do slightly better on the first WRITE of a private file mapping,
> > because we can retry earlier (in lock_vma_under_rcu(), rather than
> > vmf_anon_prepare() later).
> >
> > - We may always use mmap lock for the initial READs on a private file
> > mappings, while before this patch it _can_ (only when no WRITE ever
> > happened... but it doesn't make much sense for a MAP_PRIVATE..) do the
> > read fault with per-vma lock.
> >
> > Then noted that right after any WRITE the anon_vma will be stablized, then
> > there will be no difference.
>
> In my view, the anon_vma is always stable. During a write it is
> initialised.
>
> >And I believe that should be the majority
> > cases too; I also did try to run a program, writting to MAP_PRIVATE file
> > memory (that I pre-headed in the page cache) and I can hardly measure a
> > difference in performance.
> >
> > Let's simply ignore all those trivial corner cases and unify the anon_vma
> > check from three places into one. I also didn't check the rest users of
> > lock_vma_under_rcu(), where in a !fault context it could even fix something
> > that used to race with private file mappings but I didn't check further.
>
> This change will increase mmap semaphore contention. I'd like to move
> to a more common structured layout, but I think we need to find a way to
> do this without failing the lock_vma_under_rcu() more frequently. In
> fact, we need to be looking for ways to make it fail less.
>
> The UFFD code is more strict on what is acceptable for a per-vma lock
> [1]. This user has a restriction that will decrease the benefits of the
> per-vma lock, but we shouldn't make this case the common one.
>
> As I'm sure you are aware, the page fault path is the most common path
> for using per-vma locks and should minimize taking the mmap lock as much
> as possible.
>
> I don't think we should increase the lock contention to simplify the
> code.

Right. I believe the major issue right now with this patch is that it can
reduce some performance, I didn't worry on that, but it seems that's a
concern to many already. It could be that I am just too bold on such
attempt.

However note that what I want to do (put all anon_vma magic together)
should be orthogonal issue v.s. above perf degrade. So please feel free to
have a look at below, I believe it won't degrade anything, but afaiu it
should speed up two things even if only trivially:

- The WRITE on private file mapping will now retry earlier than before
- The READ on anon zero pages will now allow per-vma lock

Neither of above will be allowed without below patch. And below patch
should also put all anon_vma checks together.

I didn't even compile test it yet, but let me send it out first to see
whether any of you has any comments, so take it as super-rfc versions.

Note that I attached patch 1 on a hugetlb change, that'll be needed by
patch 2 otherwise we can hit issue with hugetlb private mappings. However
patch 1 alone will need some justifications too, but IMHO we can ignore it
for now and just look at patch 2. I attached patch 1 just in case it's
good to reference.

Thanks,

===8<===