Re: Enabling peer to peer device transactions for PCIe devices

From: Logan Gunthorpe
Date: Wed Nov 23 2016 - 16:11:39 EST




On 23/11/16 01:33 PM, Jason Gunthorpe wrote:
> On Wed, Nov 23, 2016 at 02:58:38PM -0500, Serguei Sagalovitch wrote:
>
>> We do not want to have "highly" dynamic translation due to
>> performance cost. We need to support "overcommit" but would
>> like to minimize impact. To support RDMA MRs for GPU/VRAM/PCIe
>> device memory (which is must) we need either globally force
>> pinning for the scope of "get_user_pages() / "put_pages" or have
>> special handling for RDMA MRs and similar cases.
>
> As I said, there is no possible special handling. Standard IB hardware
> does not support changing the DMA address once a MR is created. Forget
> about doing that.

Yeah, that's essentially the point I was trying to make. Not to mention
all the other unrelated hardware that can't DMA to an address that might
disappear mid-transfer.

> Only ODP hardware allows changing the DMA address on the fly, and it
> works at the page table level. We do not need special handling for
> RDMA.

I am aware of ODP but, noted by others, it doesn't provide a general
solution to the points above.

> Like I said, this is the direction the industry seems to be moving in,
> so any solution here should focus on VMAs/page tables as the way to link
> the peer-peer devices.

Yes, this was the appeal to us of using ZONE_DEVICE.

> To me this means at least items #1 and #3 should be removed from
> Alexander's list.

It's also worth noting that #4 makes use of ZONE_DEVICE (#2) so they are
really the same option. iopmem is really just one way to get BAR
addresses to user-space while inside the kernel it's ZONE_DEVICE.

Logan