Re: [PATCH 38/94] mm/gup: Add mm_populate_vma() for use when the vma is known

From: Matthew Wilcox
Date: Mon May 03 2021 - 12:02:32 EST


On Mon, May 03, 2021 at 03:53:58PM +0000, Liam Howlett wrote:
> * Michel Lespinasse <michel@xxxxxxxxxxxxxx> [210501 01:13]:
> > On Wed, Apr 28, 2021 at 03:36:08PM +0000, Liam Howlett wrote:
> > > When a vma is known, avoid calling mm_populate to search for the vma to
> > > populate.
> > >
> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> > > ---
> > > mm/gup.c | 20 ++++++++++++++++++++
> > > mm/internal.h | 4 ++++
> > > 2 files changed, 24 insertions(+)
> > >
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index c3a17b189064..48fe98ab0729 100644
> > > --- a/mm/gup.c
> > > +++ b/mm/gup.c
> > > @@ -1468,6 +1468,26 @@ long populate_vma_page_range(struct vm_area_struct *vma,
> > > NULL, NULL, locked);
> > > }
> > >
> > > +/*
> > > + * mm_populate_vma() - Populate a single range in a single vma.
> > > + * @vma: The vma to populate.
> > > + * @start: The start address to populate
> > > + * @end: The end address to stop populating
> > > + *
> > > + * Note: Ignores errors.
> > > + */
> > > +void mm_populate_vma(struct vm_area_struct *vma, unsigned long start,
> > > + unsigned long end)
> > > +{
> > > + struct mm_struct *mm = current->mm;
> > > + int locked = 1;
> > > +
> > > + mmap_read_lock(mm);
> > > + populate_vma_page_range(vma, start, end, &locked);
> > > + if (locked)
> > > + mmap_read_unlock(mm);
> > > +}
> > > +
> >
> > This seems like a nonsensical API at first glance - VMAs that are found
> > in the vma tree might be modified, merged, split, or freed at any time
> > if the mmap lock is not held, so the API can not be safely used. I think
> > this applies to maple tree vmas just as much as it did for rbtree vmas ?
>
> This is correct - it cannot be used without having the mmap_sem lock.
> This is a new internal mm code API and is used to avoid callers that use
> mm_populate() on a range that is known to be in a single VMA and already
> have that VMA. So instead of re-walking the tree to re-find the VMAs,
> this function can be used with the known VMA and range.
>
> It is used as described in patch 39 and 40 of this series.

In patch 39, what you do is:

1 Take the mmap_sem for write
2 do stuff
3 Drop the mmap_sem
4 Call mm_populate_vm() with the vma, which takes the mmap_sem
for read

The problem is that between 3 & 4, a racing thread might cause us to free
the vma and so we've now passed a bogus pointer into mm_populate_vm().

What we need instead is to downgrade the mmap_sem from write to read at
step 3, so the vma is guaranteed to still be good.