Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM
From: Oded Gabbay
Date: Sat Oct 03 2020 - 04:34:32 EST
On Sat, Oct 3, 2020 at 2:31 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Fri, Oct 02, 2020 at 08:16:48PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > > On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote:
> > > > For $reasons I've stumbled over this code and I'm not sure the change
> > > > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector:
> > > > convert get_user_pages() --> pin_user_pages()") was entirely correct.
> > > >
> > > > This here is used for long term buffers (not just quick I/O) like
> > > > RDMA, and John notes this in his patch. But I thought the rule for
> > > > these is that they need to add FOLL_LONGTERM, which John's patch
> > > > didn't do.
> > > >
> > > > There is already a dax specific check (added in b7f0554a56f2 ("mm:
> > > > fail get_vaddr_frames() for filesystem-dax mappings")), so this seems
> > > > like the prudent thing to do.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
> > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > > Cc: John Hubbard <jhubbard@xxxxxxxxxx>
> > > > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx>
> > > > Cc: Jan Kara <jack@xxxxxxx>
> > > > Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> > > > Cc: linux-mm@xxxxxxxxx
> > > > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> > > > Cc: linux-samsung-soc@xxxxxxxxxxxxxxx
> > > > Cc: linux-media@xxxxxxxxxxxxxxx
> > > > Hi all,
> > > >
> > > > I stumbled over this and figured typing this patch can't hurt. Really
> > > > just to maybe learn a few things about how gup/pup is supposed to be
> > > > used (we have a bit of that in drivers/gpu), this here isn't really
> > > > ralated to anything I'm doing.
> > >
> > > FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO
> >
> > Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib
> > verb access mode indicates possible writes. I'm not really clear on
> > why FOLL_WRITE isn't enough any why you need to be able to write
> > through a vma that's write protected currently.
>
> Ah, FOLL_FORCE | FOLL_WRITE means *read* confusingly enough, and the
> only reason you'd want this version for read is if you are doing
> longterm stuff. I wrote about this recently:
>
> https://lore.kernel.org/linux-mm/20200928235739.GU9916@xxxxxxxx/
>
> > > Since every driver does this wrong anything that uses this is creating
> > > terrifying security issues.
> > >
> > > IMHO this whole API should be deleted :(
> >
> > Yeah that part I just tried to conveniently ignore. I guess this dates
> > back to a time when ioremaps where at best fixed, and there wasn't
> > anything like a gpu driver dynamically managing vram around, resulting
> > in random entirely unrelated things possibly being mapped to that set
> > of pfns.
>
> No, it was always wrong. Prior to GPU like cases the lifetime of the
> PTE was tied to the vma and when the vma becomes free the driver can
> move the things in the PTEs to 'free'. Easy to trigger use-after-free
> issues and devices like RDMA have security contexts attached to these
> PTEs so it becomes a serious security bug to do something like this.
>
> > The underlying follow_pfn is also used in other places within
> > drivers/media, so this doesn't seem to be an accident, but actually
> > intentional.
>
> Looking closely, there are very few users, most *seem* pointless, but
> maybe there is a crazy reason?
>
> The sequence
> get_vaddr_frames();
> frame_vector_to_pages();
> sg_alloc_table_from_pages();
>
> Should be written
> pin_user_pages_fast(FOLL_LONGTERM);
> sg_alloc_table_from_pages()
>
> There is some 'special' code in frame_vector_to_pages() that tries to
> get a struct page for things from a VM_IO or VM_PFNMAP...
>
> Oh snap, that is *completely* broken! If the first VMA is IO|PFNMAP
> then get_vaddr_frames() iterates over all VMAs in the range, of any
> kind and extracts the PTEs then blindly references them! This means it
> can be used to use after free normal RAM struct pages!! Gah!
>
> Wow. Okay. That has to go.
>
> So, I *think* we can assume there is no sane cases where
> frame_vector_to_pages() succeeds but pin_user_pages() wasn't called.
>
> That means the users here:
> - habanalabs: Hey Oded can you fix this up?
Yes, no problem, I'll do it very soon.
Thanks,
Oded
>
> - gpu/exynos: Daniel can you get someone there to stop using it?
>
> - media/videobuf via vb2_dma_sg_get_userptr()
>
> Should all be switched to the standard pin_user_pages sequence
> above.
>
> That leaves the only interesting places as vb2_dc_get_userptr() and
> vb2_vmalloc_get_userptr() which both completely fail to follow the
> REQUIRED behavior in the function's comment about checking PTEs. It
> just DMA maps them. Badly broken.
>
> Guessing this hackery is for some embedded P2P DMA transfer?
>
> After he three places above should use pin_user_pages_fast(), then
> this whole broken API should be moved into videobuf2-memops.c and a
> big fat "THIS DOESN'T WORK" stuck on it.
>
> videobuf2 should probably use P2P DMA buf for this instead.
>
> Jason