Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

From: Jason Gunthorpe
Date: Tue Jun 22 2021 - 11:41:46 EST


On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
> > > Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> > > > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > > > <ckoenig.leichtzumerken@xxxxxxxxx> wrote:
> > > > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > > >
> > > > > > > > Another thing I want to emphasize is that we are doing p2p only
> > > > > > > > through the export/import of the FD. We do *not* allow the user to
> > > > > > > > mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > > > > to these pages through the userspace.
> > > > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > > > even using.
> > > > > > The major problem with this approach is that DMA-buf is also used for
> > > > > > memory which isn't CPU accessible.
> > > > That isn't an issue here because the memory is only intended to be
> > > > used with P2P transfers so it must be CPU accessible.
> > > No, especially P2P is often done on memory resources which are not even
> > > remotely CPU accessible.
> > That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
> > CPU accessible.
>
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with the CPU.

None of that is PCI P2P.

It is all some specialty direct transfer.

You can't reasonably call dma_map_resource() on non CPU mapped memory
for instance, what address would you pass?

Do not confuse "I am doing transfers between two HW blocks" with PCI
Peer to Peer DMA transfers - the latter is a very narrow subcase.

> No, just using the dma_map_resource() interface.

Ik, but yes that does "work". Logan's series is better.

> > > > > I'll go and read Logan's patch-set to see if that will work for us in
> > > > > the future. Please remember, as Daniel said, we don't have struct page
> > > > > backing our device memory, so if that is a requirement to connect to
> > > > > Logan's work, then I don't think we will want to do it at this point.
> > > > It is trivial to get the struct page for a PCI BAR.
> > > Yeah, but it doesn't make much sense. Why should we create a struct page for
> > > something that isn't even memory in a lot of cases?
> > Because the iommu and other places need this handle to setup their
> > stuff. Nobody has yet been brave enough to try to change those flows
> > to be able to use a physical CPU address.
>
> Well that is certainly not true. I'm just not sure if that works with all
> IOMMU drivers thought.

Huh? All the iommu interfaces except for the dma_map_resource() are
struct page based. dma_map_resource() is slow ad limited in what it
can do.

Jason