Re: Enabling peer to peer device transactions for PCIe devices

From: Jerome Glisse
Date: Thu Jan 05 2017 - 14:54:38 EST


On Thu, Jan 05, 2017 at 12:01:13PM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 05, 2017 at 01:39:29PM -0500, Jerome Glisse wrote:
>
> > 1) peer-to-peer because of userspace specific API like NVidia GPU
> > direct (AMD is pushing its own similar API i just can't remember
> > marketing name). This does not happen through a vma, this happens
> > through specific device driver call going through device specific
> > ioctl on both side (GPU and RDMA). So both kernel driver are aware
> > of each others.
>
> Today you can only do user-initiated RDMA operations in conjection
> with a VMA.
>
> We'd need a really big and strong reason to create an entirely new
> non-VMA based memory handle scheme for RDMA.
>
> So my inclination is to just completely push back on this idea. You
> need a VMA to do RMA.
>
> GPUs need to create VMAs for the memory they want to RDMA from, even
> if the VMA handle just causes SIGBUS for any CPU access.

Mellanox and NVidia support peer to peer with what they market a
GPUDirect. It only works without IOMMU. It is probably not upstream :

https://www.mail-archive.com/linux-rdma@xxxxxxxxxxxxxxx/msg21402.html

I thought it was but it seems it require an out of tree driver to work.

Wether there is a vma or not isn't important to the issue anyway. If
you want to enforce VMA rule for RDMA it is an RDMA specific discussion
in which i don't want to be involve, it is not my turf :)

What matter is the back channel API between peer-to-peer device. Like
the above patchset points out for GPU we need to be able to invalidate
a mapping at any point in time. Pining is not something we want to
live with.

So the VMA consideration does not change what i was saying there is
2 cases:
1) device vma (might be restricted to specific userspace API)
2) regular vma (!VM_MIXED and no special pte entry)

For 1) you need back channel it can be per device driver or we can agree
to some common API that can add to vm_operations_struct.

For 2) expectation is that you will have valid struct page but you still
need special handling at the dma API level.

In 1) the peer-to-peer mapping is track at vma level and mediated there.
For 2) it is per page and it is mediated at that level.

In both case on you have setup mapping you need to handle the IOMMU and
the PCI bridge restriction that might apply and i believe that the DMA
API is the place where we want to solve that second side of the problem.

Cheers,
Jérôme