Re: [PATCH v5 05/15] mm/frame-vector: Use FOLL_LONGTERM

From: Daniel Vetter
Date: Sat Oct 31 2020 - 10:45:59 EST


On Sat, Oct 31, 2020 at 3:55 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote:
>
> On 10/30/20 3:08 AM, Daniel Vetter wrote:
> > This is used by media/videbuf2 for persistent dma mappings, not just
> > for a single dma operation and then freed again, so needs
> > FOLL_LONGTERM.
> >
> > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> > locking issues. Rework the code to pull the pup path out from the
> > mmap_sem critical section as suggested by Jason.
> >
> > By relying entirely on the vma checks in pin_user_pages and follow_pfn
>
> There are vma checks in pin_user_pages(), but this patch changes things
> to call pin_user_pages_fast(). And that does not have the vma checks.
> More below about this:
>
> > (for vm_flags and vma_is_fsdax) we can also streamline the code a lot.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
> > Cc: Jason Gunthorpe <jgg@xxxxxxxx>
> > Cc: Pawel Osciak <pawel@xxxxxxxxxx>
> > Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> > Cc: Kyungmin Park <kyungmin.park@xxxxxxxxxxx>
> > Cc: Tomasz Figa <tfiga@xxxxxxxxxxxx>
> > Cc: Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Cc: John Hubbard <jhubbard@xxxxxxxxxx>
> > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx>
> > Cc: Jan Kara <jack@xxxxxxx>
> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> > Cc: linux-mm@xxxxxxxxx
> > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> > Cc: linux-samsung-soc@xxxxxxxxxxxxxxx
> > Cc: linux-media@xxxxxxxxxxxxxxx
> > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > --
> > v2: Streamline the code and further simplify the loop checks (Jason)
> >
> > v5: Review from Tomasz:
> > - fix page counting for the follow_pfn case by resetting ret
> > - drop gup_flags paramater, now unused
> > ---
> > .../media/common/videobuf2/videobuf2-memops.c | 3 +-
> > include/linux/mm.h | 2 +-
> > mm/frame_vector.c | 53 ++++++-------------
> > 3 files changed, 19 insertions(+), 39 deletions(-)
> >
> > diff --git a/drivers/media/common/videobuf2/videobuf2-memops.c b/drivers/media/common/videobuf2/videobuf2-memops.c
> > index 6e9e05153f4e..9dd6c27162f4 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-memops.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-memops.c
> > @@ -40,7 +40,6 @@ struct frame_vector *vb2_create_framevec(unsigned long start,
> > unsigned long first, last;
> > unsigned long nr;
> > struct frame_vector *vec;
> > - unsigned int flags = FOLL_FORCE | FOLL_WRITE;
> >
> > first = start >> PAGE_SHIFT;
> > last = (start + length - 1) >> PAGE_SHIFT;
> > @@ -48,7 +47,7 @@ struct frame_vector *vb2_create_framevec(unsigned long start,
> > vec = frame_vector_create(nr);
> > if (!vec)
> > return ERR_PTR(-ENOMEM);
> > - ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
> > + ret = get_vaddr_frames(start & PAGE_MASK, nr, vec);
> > if (ret < 0)
> > goto out_destroy;
> > /* We accept only complete set of PFNs */
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index ef360fe70aaf..d6b8e30dce2e 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1765,7 +1765,7 @@ struct frame_vector {
> > struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > void frame_vector_destroy(struct frame_vector *vec);
> > int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > - unsigned int gup_flags, struct frame_vector *vec);
> > + struct frame_vector *vec);
> > void put_vaddr_frames(struct frame_vector *vec);
> > int frame_vector_to_pages(struct frame_vector *vec);
> > void frame_vector_to_pfns(struct frame_vector *vec);
> > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > index 10f82d5643b6..f8c34b895c76 100644
> > --- a/mm/frame_vector.c
> > +++ b/mm/frame_vector.c
> > @@ -32,13 +32,12 @@
> > * This function takes care of grabbing mmap_lock as necessary.
> > */
> > int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> > - unsigned int gup_flags, struct frame_vector *vec)
> > + struct frame_vector *vec)
> > {
> > struct mm_struct *mm = current->mm;
> > struct vm_area_struct *vma;
> > int ret = 0;
> > int err;
> > - int locked;
> >
> > if (nr_frames == 0)
> > return 0;
> > @@ -48,40 +47,26 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >
> > start = untagged_addr(start);
> >
> > - mmap_read_lock(mm);
> > - locked = 1;
> > - vma = find_vma_intersection(mm, start, start + 1);
> > - if (!vma) {
> > - ret = -EFAULT;
> > - goto out;
> > - }
> > -
> > - /*
> > - * While get_vaddr_frames() could be used for transient (kernel
> > - * controlled lifetime) pinning of memory pages all current
> > - * users establish long term (userspace controlled lifetime)
> > - * page pinning. Treat get_vaddr_frames() like
> > - * get_user_pages_longterm() and disallow it for filesystem-dax
> > - * mappings.
> > - */
> > - if (vma_is_fsdax(vma)) {
> > - ret = -EOPNOTSUPP;
> > - goto out;
> > - }
> > -
> > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
>
> By removing this check from this location, and changing from
> pin_user_pages_locked() to pin_user_pages_fast(), I *think* we end up
> losing the check entirely. Is that intended? If so it could use a comment
> somewhere to explain why.

Yeah this wasn't intentional. I think I needed to drop the _locked
version to prep for FOLL_LONGTERM, and figured _fast is always better.
But I didn't realize that _fast doesn't have the vma checks, gup.c got
me a bit confused.

I'll remedy this in all the patches where this applies (because a
VM_IO | VM_PFNMAP can point at struct page backed memory, and that
exact use-case is what we want to stop with the unsafe_follow_pfn work
since it wreaks things like cma or security).

Aside: I do wonder whether the lack for that check isn't a problem.
VM_IO | VM_PFNMAP generally means driver managed, which means the
driver isn't going to consult the page pin count or anything like that
(at least not necessarily) when revoking or moving that memory, since
we're assuming it's totally under driver control. So if pup_fast can
get into such a mapping, we might have a problem.
-Daniel

> thanks,
> --
> John Hubbard
> NVIDIA
>
> > + ret = pin_user_pages_fast(start, nr_frames,
> > + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> > + (struct page **)(vec->ptrs));
> > + if (ret > 0) {
> > vec->got_ref = true;
> > vec->is_pfns = false;
> > - ret = pin_user_pages_locked(start, nr_frames,
> > - gup_flags, (struct page **)(vec->ptrs), &locked);
> > - goto out;
> > + goto out_unlocked;
> > }
> >
> > + mmap_read_lock(mm);
> > vec->got_ref = false;
> > vec->is_pfns = true;
> > + ret = 0;
> > do {
> > unsigned long *nums = frame_vector_pfns(vec);
> >
> > + vma = find_vma_intersection(mm, start, start + 1);
> > + if (!vma)
> > + break;
> > +
> > while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> > err = follow_pfn(vma, start, &nums[ret]);
> > if (err) {
> > @@ -92,17 +77,13 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> > start += PAGE_SIZE;
> > ret++;
> > }
> > - /*
> > - * We stop if we have enough pages or if VMA doesn't completely
> > - * cover the tail page.
> > - */
> > - if (ret >= nr_frames || start < vma->vm_end)
> > + /* Bail out if VMA doesn't completely cover the tail page. */
> > + if (start < vma->vm_end)
> > break;
> > - vma = find_vma_intersection(mm, start, start + 1);
> > - } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> > + } while (ret < nr_frames);
> > out:
> > - if (locked)
> > - mmap_read_unlock(mm);
> > + mmap_read_unlock(mm);
> > +out_unlocked:
> > if (!ret)
> > ret = -EFAULT;
> > if (ret > 0)
> >
>
>


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch