Re: [PATCH] mm: Always sanity check anon_vma first for per-vma locks

From: Suren Baghdasaryan
Date: Mon Apr 15 2024 - 11:58:50 EST


On Fri, Apr 12, 2024 at 8:19 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Fri, Apr 12, 2024 at 09:53:29AM -0500, Suren Baghdasaryan wrote:
> > Unless vmf_anon_prepare() already explains why vma->anon_vma poses a
> > problem for per-vma locks, we should have an explanation there. This
> > comment would serve that purpose IMO.
>
> I'll do you one better; here's some nice kernel-doc for
> vmd_anon_prepare():

I was looking at the find_tcp_vma(), which seems to be the only other
place where lock_vma_under_rcu() is currently used. I think it's used
there only for file-backed pages, so I don't think your change affects
that usecase but this makes me think that we should have some kind of
a warning for lock_vma_under_rcu() future users... Maybe your addition
of mmap_assert_locked() inside __anon_vma_prepare() is enough. Please
don't forget to include that assertion into your final patch.


>
> commit f89a1cd17f13
> Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> Date: Fri Apr 12 10:41:02 2024 -0400
>
> mm: Delay the check for a NULL anon_vma
>
> Instead of checking the anon_vma early in the fault path where all page
> faults pay the cost, delay it until we know we're going to need the
> anon_vma to be filled in. This will have a slight negative effect on the
> first fault in an anonymous VMA, but it shortens every other page fault.
> It also makes the code slightly cleaner as the anon and file backed
> fault handling look more similar.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d8d2ed80b0bf..718f91f74a48 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1057,11 +1057,13 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
> gfp_t gfp;
> struct folio *folio;
> unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> + vm_fault_t ret;
>
> if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
> return VM_FAULT_FALLBACK;
> - if (unlikely(anon_vma_prepare(vma)))
> - return VM_FAULT_OOM;
> + ret = vmf_anon_prepare(vmf);
> + if (ret)
> + return ret;
> khugepaged_enter_vma(vma, vma->vm_flags);
>
> if (!(vmf->flags & FAULT_FLAG_WRITE) &&
> diff --git a/mm/memory.c b/mm/memory.c
> index 6e2fe960473d..46b509c3bbc1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3213,6 +3213,21 @@ static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
> return VM_FAULT_RETRY;
> }
>
> +/**
> + * vmf_anon_prepare - Prepare to handle an anonymous fault.
> + * @vmf: The vm_fault descriptor passed from the fault handler.
> + *
> + * When preparing to insert an anonymous page into a VMA from a
> + * fault handler, call this function rather than anon_vma_prepare().
> + * If this vma does not already have an associated anon_vma and we are
> + * only protected by the per-VMA lock, the caller must retry with the
> + * mmap_lock held. __anon_vma_prepare() will look at adjacent VMAs to
> + * determine if this VMA can share its anon_vma, and that's not safe to
> + * do with only the per-VMA lock held for this VMA.
> + *
> + * Return: 0 if fault handling can proceed. Any other value should be
> + * returned to the caller.
> + */
> vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
> @@ -4437,8 +4452,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> }
>
> /* Allocate our own private page. */
> - if (unlikely(anon_vma_prepare(vma)))
> - goto oom;
> + ret = vmf_anon_prepare(vmf);
> + if (ret)
> + return ret;
> /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */
> folio = alloc_anon_folio(vmf);
> if (IS_ERR(folio))
> @@ -5821,15 +5837,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> if (!vma_start_read(vma))
> goto inval;
>
> - /*
> - * find_mergeable_anon_vma uses adjacent vmas which are not locked.
> - * This check must happen after vma_start_read(); otherwise, a
> - * concurrent mremap() with MREMAP_DONTUNMAP could dissociate the VMA
> - * from its anon_vma.
> - */
> - if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma))
> - goto inval_end_read;
> -
> /* Check since vm_start/vm_end might change before we lock the VMA */
> if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> goto inval_end_read;