Re: [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
From: Lorenzo Stoakes
Date: Fri Apr 04 2025 - 09:05:11 EST
On Fri, Apr 04, 2025 at 12:53:15PM +0000, Wei Yang wrote:
> On Mon, Mar 17, 2025 at 09:15:03PM +0000, Lorenzo Stoakes wrote:
> [...]
> >However, we have a problem here - typically the vma passed here is the
> >destination VMA.
> >
> >For instance in vma_merge_existing_range() we invoke:
> >
> >can_vma_merge_left()
> >-> [ check that there is an immediately adjacent prior VMA ]
> >-> can_vma_merge_after()
> > -> is_mergeable_vma() for general attribute check
> >-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
> >
> >So if we were considering a target unfaulted 'prev':
> >
> > unfaulted faulted
> > |-----------|-----------|
> > | prev | vma |
> > |-----------|-----------|
> >
> >This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
> >
> >The list_is_singular() check for vma->anon_vma_chain, an empty list on
> >fault, would cause this merge to _fail_ even though all else indicates a
> >merge.
> >
>
> Great spot. It is hiding there for 15 years.
Thanks!
>
> >Equally a simple merge into a next VMA would hit the same problem:
> >
> > faulted unfaulted
> > |-----------|-----------|
> > | vma | next |
> > |-----------|-----------|
> >
> [...]
> >---
> > mm/vma.c | 81 +++++++++++++++++++++++---------
> > tools/testing/vma/vma.c | 100 +++++++++++++++++++++-------------------
> > 2 files changed, 111 insertions(+), 70 deletions(-)
> >
> >diff --git a/mm/vma.c b/mm/vma.c
> >index 5cdc5612bfc1..5418eef3a852 100644
> >--- a/mm/vma.c
> >+++ b/mm/vma.c
> >@@ -57,6 +57,22 @@ struct mmap_state {
> > .state = VMA_MERGE_START, \
> > }
> >
> >+/*
> >+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
> >+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
> >+ * would mean a wider range of folios sharing the root anon_vma lock, and thus
> >+ * potential lock contention, we do not wish to encourage merging such that this
> >+ * scales to a problem.
> >+ */
>
> I don't follow here. Take a look into do_wp_page(), where CoW happens. But I
> don't find where it will unlink parent anon_vma from vma->anon_vma_chain.
Look at anon_vma_clone() in fork case. It's not the point of CoW that's the
issue, it's propagation of AVC's upon fork.
>
> Per my understanding, the unlink behavior happens in unlink_anon_vma() which
> unlink all anon_vma on vma->anon_vma_chain. And the normal caller of
> unlink_anon_vma() is free_pgtables(). Other callers are on error path to
> release prepared data. From this perspective, I don't see the chance to unlink
> parent anon_vma from vma->anon_vma_chain either.
>
> But maybe I missed something. If it is not too bother, would you mind giving
> me a hint?
What you're saying is 'we never go back and fix this up once unCoW'd' which is
true, but I don't want to write several page-length essays in comments, and this
is a sensible way of looking at things for the purposes of this check.
In future, we may want to actually do something like this, if it makes sense.
>
> >+static bool vma_had_uncowed_parents(struct vm_area_struct *vma)
> >+{
> >+ /*
> >+ * The list_is_singular() test is to avoid merging VMA cloned from
> >+ * parents. This can improve scalability caused by anon_vma lock.
> >+ */
> >+ return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain);
> >+}
> >+
> > static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
> > {
> > struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
> >@@ -82,24 +98,28 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
> > return true;
> > }
> >
> >-static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> >- struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> >+static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
> > {
> >+ struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
> >+ struct vm_area_struct *src = vmg->middle; /* exisitng merge case. */
> ^^^
>
> A trivial typo here.
Thanks, if I have to respin I'll fix it.
>
> >+ struct anon_vma *tgt_anon = tgt->anon_vma;
> >+ struct anon_vma *src_anon = vmg->anon_vma;
> >+
> > /*
> >- * The list_is_singular() test is to avoid merging VMA cloned from
> >- * parents. This can improve scalability caused by anon_vma lock.
> >+ * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
> >+ * will remove the existing VMA's anon_vma's so there's no scalability
> >+ * concerns.
> > */
> >- if ((!anon_vma1 || !anon_vma2) && (!vma ||
> >- list_is_singular(&vma->anon_vma_chain)))
> >- return true;
> >- return anon_vma1 == anon_vma2;
> >-}
> >+ VM_WARN_ON(src && src_anon != src->anon_vma);
> >
> >-/* Are the anon_vma's belonging to each VMA compatible with one another? */
> >-static inline bool are_anon_vmas_compatible(struct vm_area_struct *vma1,
> >- struct vm_area_struct *vma2)
> >-{
> >- return is_mergeable_anon_vma(vma1->anon_vma, vma2->anon_vma, NULL);
> >+ /* Case 1 - we will dup_anon_vma() from src into tgt. */
> >+ if (!tgt_anon && src_anon)
> >+ return !vma_had_uncowed_parents(src);
> >+ /* Case 2 - we will simply use tgt's anon_vma. */
> >+ if (tgt_anon && !src_anon)
> >+ return !vma_had_uncowed_parents(tgt);
> >+ /* Case 3 - the anon_vma's are already shared. */
> >+ return src_anon == tgt_anon;
> > }
> >
> > /*
> >@@ -164,7 +184,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
> > pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
> >
> > if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
> >- is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
> >+ is_mergeable_anon_vma(vmg, /* merge_next = */ true)) {
> > if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
> > return true;
> > }
> >@@ -184,7 +204,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
> > static bool can_vma_merge_after(struct vma_merge_struct *vmg)
> > {
> > if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
> >- is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
> >+ is_mergeable_anon_vma(vmg, /* merge_next = */ false)) {
> > if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
> > return true;
> > }
>
> We have two sets API to check vma's mergeability:
>
> * can_vma_merge_before/after
> * can_vma_merge_left/right
>
> And xxx_merge_right() calls xxx_merge_before(), which is a little confusing.
>
> Now can_vma_merge_before/after looks almost same. Do you think it would be
> easier for reading to consolidate to one function?
Yeah it's a bit of a mess, some of it is 'maintaining the previous
structure' since people were at least familiar with it.
But this is something I do plan to address in future. Right now I think
with the heavy commenting and improvements on merge logic in general there
should really not be _too_ much confusion.
So a sensible fixup would be as part of a larger refactor. There is nothing
to do here at the moment, in my view.
>
> --
> Wei Yang
> Help you, Help me