Re: [PULL] topic/iomem-mmap-vs-gup

From: Daniel Vetter
Date: Mon May 10 2021 - 03:17:17 EST

On Sat, May 8, 2021 at 6:47 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> [ Daniel, please fix your broken email setup. You have this insane
> "Reply-to" list that just duplicates all the participants. Very
> broken, very annoying ]
> On Fri, May 7, 2021 at 8:53 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
> >
> > So personally I think the entire thing should just be thrown out, it's all
> > levels of scary and we have zero-copy buffer sharing done properly with
> > dma-buf since years in v4l.
> So I've been looking at this more, and the more I look at it, the less
> I like this series.
> I think the proper fix is to just fix things.
> For example, I'm looking at the v4l users of follow_pfn(), and I find
> get_vaddr_frames(), which is just broken.
> Fine, we know users are broken, but look at what appears to be the
> main user of get_vaddr_frames(): vb2_dc_get_userptr().
> What does that function do? Immediately after doing
> get_vaddr_frames(), it tries to turn those pfn's into page pointers,
> and then do sg_alloc_table_from_pages() on the end result.
> Yes, yes, it also has that "ok, that failed, let's try to see if it's
> some physically contiguous mapping" and do DMA directly to those
> physical pages, but the point there is that that only happens when
> they weren't normal pages to begin with.
> So thew *fix* for at least that path is to
> (a) just use the regular pin_user_pages() for normal pages

Yup, the "rip it all out" solution amounts to replacing this all,
including frame_vector helper code, with pin_user_pages.

> (b) perhaps keep the follow_pfn() case, but then limit it to that "no
> page backing" and that physical pages case.
> And honestly, the "struct frame_vector" thing already *has* support
> for this, and the problem is simply that the v4l code has decided to
> have the callers ask for pfn's rather than have the callers just ask
> for a frame-vector that is either "pfn's with no paeg backing" _or_
> "page list with proper page reference counting".
> So this series of yours that just disables follow_pfn() actually seems
> very wrong.
> I think follow_pfn() is ok for the actual "this is not a 'struct page'
> backed area", and disabling that case is wrong even going forward.

I think this is where you miss a bit: We very much also want to stop
pinned userptr to physcial addresses that aren't page backed. This
might very well be some gpu pci bar, backed by vram, and vram is
managed as dynamically as struct page backed stuff (and there's all
the hmm dreams to make it actually use struct page, but that's another

So by the time the media hw access that vb2 userptr buffer there's
good chances someone else's data is now there. If vb2 would have a
mmu_notifier subscription or similar to follow pte updates the gpu
driver does, then it would be all fine. But this vb2 model is a pinned
one, hence not fixable.

The other more practical issue is that peer2peer dma on modern hw
needs quite some setup. Just taking a cpu pfn and hoping that matches
the bus addr your device would need is a bit optimistic.

One theoretical & proper fix I discussed with Jason Gunthrope would be
to replace the pfn lookup with a lookup for a struct dma_buf. Which
has proper interfaces for pinning gpu buffers, figuring out p2p dma or
just figuring out the right dma mapping and all that. Idea was to make
a direct vma->dma_buf lookup or something like that. But consensus is
also that outside of gpus and very closely related things using
dma_buf is not a great idea, because there's a few too many silly
rules involved. For everyone else it's better to make the struct page
managed device memory stuff work most likely.

> End result, I think the proper model is:
> - keep follow_pfn(), but limit it to the "not vm_normal_page()" case,
> and return error for some real page mapping
> - make the get_vaddr_frames() first try "pin_user_pages()" (and
> create a page array) and fall back to "follow_pfn()" if that fails (or
> the other way around). Set the
> IOW, get_vaddr_frames() would just do
> vec->got_ref = is_pages;
> vec->is_pfns = !is_pages;
> and everything would just work out - the v4l code seems to already
> have all the support for "it's a ofn array" vs "it's properly
> refcounted pages".
> So the only case we should disallow is the mixed case, that the v4l
> code already seems to not be able to handle anyway (and honestly, it
> looks like "got_ref/is_pfns" should be just one flag - they always
> have to have the opposite values).
> So I think this "unsafe_follow_pfn()" halfway step is actively wrong.
> It doesn't move us forward. Quite the reverse. It just makes the
> proper fix harder.
> End result: not pulling it, unless somebody can explain to me in small
> words why I'm wrong and have the mental capacity of a damaged rodent.

No rodents I think, just more backstory of how this all fits. tldr;
pin_user_pages is the only safe use of this vb2 userptr thing.

Cheers, Daniel
Daniel Vetter
Software Engineer, Intel Corporation