Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

From: Jason Gunthorpe
Date: Wed Jan 30 2019 - 18:30:35 EST


On Wed, Jan 30, 2019 at 03:52:13PM -0700, Logan Gunthorpe wrote:
>
>
> On 2019-01-30 2:50 p.m., Jason Gunthorpe wrote:
> > On Wed, Jan 30, 2019 at 02:01:35PM -0700, Logan Gunthorpe wrote:
> >
> >> And I feel the GUP->SGL->DMA flow should still be what we are aiming
> >> for. Even if we need a special GUP for special pages, and a special DMA
> >> map; and the SGL still has to be homogenous....
> >
> > *shrug* so what if the special GUP called a VMA op instead of
> > traversing the VMA PTEs today? Why does it really matter? It could
> > easily change to a struct page flow tomorrow..
>
> Well it's so that it's composable. We want the SGL->DMA side to work for
> APIs from kernel space and not have to run a completely different flow
> for kernel drivers than from userspace memory.

If we want to have these DMA-only SGLs anyhow, then the kernel flow
can use them too.

In the kernel it easier because the 'exporter' already knows it is
working with BAR memory, so it can just do something like this:

struct dma_sg_table *sgl_dma_map_pci_bar(struct pci_device *from,
struct device *to,
unsigned long bar_ptr,
size_t length)

And then it falls down the same DMA-SGL-only kind of flow that would
exist to support the user side. ie it is the kernel version of the API
I described below.

> For GUP to do a special VMA traversal it would now need to return
> something besides struct pages which means no SGL and it means a
> completely different DMA mapping call.

GUP cannot support BAR memory because it must only return CPU memory -
I think too many callers make this assumption for it to be possible to
change it.. (see below)

A new-GUP can return DMA addresses - so it doesn't have this problem.

> > Would you feel better if this also came along with a:
> >
> > struct dma_sg_table *sgl_dma_map_user(struct device *dma_device,
> > void __user *prt, size_t len)
>
> That seems like a nice API. But certainly the implementation would need
> to use existing dma_map or pci_p2pdma_map calls, or whatever as part of
> it...

I wonder how Jerome worked the translation, I haven't looked yet..

> We actually stopped caring about the __iomem problem. We are working
> under the assumption that pages returned by devm_memremap_pages() can be
> accessed as normal RAM and does not need the __iomem designation.

As far as CPU mapped uncached BAR memory goes, this is broadly not
true.

s390x for instance requires dedicated CPU instructions to access BAR
memory.

x86 will fault if you attempt to use a SSE algorithm on uncached BAR
memory. The kernel has many SSE accelerated algorithsm so you can
never pass these special p2p SGL's through to them either. (think
parity or encryption transformations through the block stack)

Many platforms have limitations on alignment and access size for BAR
memory - you can't just blindly call memcpy and expect it to work.
(TPM is actually struggling with this now, confusingly different
versions of memcpy are giving different results on some x86 io memory)

Other platforms might fault or corrupt if an unaligned access is
attempted to BAR memory.

In short - I don't see a way to avoid knowing about __iomem in the
sgl. There are too many real use cases that require this knowledge,
and too many places that touch the SGL pages with the CPU.

I think we must have 'DMA only' SGLs and code paths that are known
DMA-only clean to make it work properly with BAR memory.

Jason