Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn

From: Daniel Vetter
Date: Sat Oct 10 2020 - 19:01:22 EST


On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart
<laurent.pinchart@xxxxxxxxxxxxxxxx> wrote:
>
> Hi Tomasz,
>
> On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> > On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> > > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > > >
> > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > > description is that this is unsafe *only if* __GFP_MOVABLE is used.
> > > >
> > > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > > specific VMAs that will have bad issues here, but there are other
> > > > types too.
> > > >
> > > > The only way to do something at a VMA level is to have a list of OK
> > > > VMAs, eg because they were creatd via a special mmap helper from the
> > > > media subsystem.
> > > >
> > > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > > they may rely on some infrastructure that could be using it behind
> > > > > the bars.
> > > >
> > > > It doesn't matter, nothing prevents the user from calling media APIs
> > > > on mmaps it gets from other subsystems.
> > >
> > > I think a good first step would be to disable userptr of non struct
> > > page backed storage going forward for any new hw support. Even on
> > > existing drivers. dma-buf sharing has been around for long enough now
> > > that this shouldn't be a problem. Unfortunately right now this doesn't
> > > seem to exist, so the entire problem keeps getting perpetuated.
> > >
> > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > > >
> > > > We can't tell from the VMA these kinds of details..
> > > >
> > > > It has to go the other direction, evey mmap that might be used as a
> > > > userptr here has to be found and the VMA specially created to allow
> > > > its use. At least that is a kernel only change, but will need people
> > > > with the HW to do this work.
> > >
> > > I think the only reasonable way to keep this working is:
> > > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > > - add dma-buf export support to fbdev and v4l
> >
> > I assume you mean V4L2 and not the obsolete V4L that is emulated in
> > the userspace by libv4l. If so, every video device that uses videobuf2
> > gets DMA-buf export for free and there is nothing needed to enable it.

Yeah. And I missed that v4l2 added dma-buf export too.

> > We probably still have a few legacy drivers using videobuf (non-2),
> > but IMHO those should be safe to put behind some disabled-by-default
> > Kconfig symbol or even completely drop, as the legacy framework has
> > been deprecated for many years already.
>
> There's 8 drivers left, and they support a very large number of devices.
> I expect unhappy users distros stop shipping them. On the other hand,
> videobuf has been deprecated for a loooooooong time, so there has been
> plenty of time to convert the remaining drivers to videobuf2. If nobody
> can do it, then we'll have to drop support for these devices given the
> security issues.

Again, the issue here is _only_ with follow_pfn. For videobuf1 this
means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2
it means it's broken for all usage (not just zero-copy userptr),
because videbuf-dma-contig.c lacks the pin_user_pages path. But that
would be easy to add if this poses a problem I think - we just need
to carry over the pin_user_pages_fast logic from videbuf2, no driver
changes required. But of course I don't think we should do that before
someone reports the regression, since videobuf1 userptr is doubly
deprecated :-)

Everything else keeps working with videobuf1 with my patch series. So
depending upon which videobuf1 implementations these 8 drivers use,
you might not even have any real breakage there.

> We have moved media drivers to staging in the past when there wasn't
> enough maintenance effort, we could do so here too.

I'm not breaking the world with this, it's really very minimal
use-case. At least as far as I'm understanding the entire media
subsystem here.
-Daniel

> > > - roll this out everywhere we still need it.
> > >
> > > Realistically this just isn't going to happen. And anything else just
> > > reimplements half of dma-buf, which is kinda pointless (you need
> > > minimally refcounting and some way to get at a promise of a permanent
> > > sg list for dma. Plus probably the vmap for kernel cpu access.
> > >
> > > > > Please let address the issue on this way, instead of broken an
> > > > > userspace API that it is there since 1991.
> > > >
> > > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > > breakage caused by a security fix for something that was a 15 years
> > > > old bug.
> > >
> > > Yeah we have a bunch of these on the drm side too. Some of them are
> > > really just "you have to upgrade userspace", and there's no real fix
> > > for the security nightmare without that.
> >
> > I think we need to phase out such userspace indeed. The Kconfig symbol
> > allows enabling the unsafe functionality for anyone who still needs
> > it, so I think it's not entirely a breakage.
>
> --
> Regards,
>
> Laurent Pinchart



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch