Re: Enabling peer to peer device transactions for PCIe devices

From: Jason Gunthorpe
Date: Thu Nov 24 2016 - 11:24:37 EST


On Thu, Nov 24, 2016 at 12:40:37AM +0000, Sagalovitch, Serguei wrote:
> On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote:
>
> > Perhaps I am not following what Serguei is asking for, but I
> > understood the desire was for a complex GPU allocator that could
> > migrate pages between GPU and CPU memory under control of the GPU
> > driver, among other things. The desire is for DMA to continue to work
> > even after these migrations happen.
>
> The main issue is to how to solve use cases when p2p is
> requested/initiated via CPU pointers where such pointers could
> point to non-system memory location e.g. VRAM.

Okay, but your list is conflating a whole bunch of problems..

1) How to go from a __user pointer to a p2p DMA address
a) How to validate, setup iommu and maybe worst case bounce buffer
these p2p DMAs
2) How to allow drivers (ie GPU allocator) dynamically
remap pages in a VMA to/from p2p DMA addresses
3) How to expose uncachable p2p DMA address to user space via mmap

> to allow "get_user_pages" to work transparently similar
> how it is/was done for "DAX Device" case. Unfortunately
> based on my understanding "DAX Device" implementation
> deal only with permanently "locked" memory (fixed location)
> unrelated to "get_user_pages"/"put_page" scope
> which doesn't satisfy requirements for "eviction" / "moving" of
> memory keeping CPU address intact.

Hurm, isn't that issue with DAX only to do with being coherent with
the page cache?

A GPU allocator would not use the page cache, it would have to
construct VMAs some other way.

> My understanding is that It will not solve RDMA MR issue where "lock"
> could be during the whole application life but (a) it will not make
> RDMA MR case worse (b) should be enough for all other cases for
> "get_user_pages"/"put_page" controlled by kernel.

Right. There is no solution to the RDMA MR issue on old hardware. Apps
that are using GPU+RDMA+Old hardware will have to use short lived MRs
and pay that performance cost, or give up on migration.

Jason