Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

From: Jason Gunthorpe
Date: Tue Jun 22 2021 - 11:34:28 EST


On Tue, Jun 22, 2021 at 06:24:28PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
> >
> > > > 1) Setting sg_page to NULL
> > > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > > > can work at all in the first place.
> > > >
> > > > All of these result in functional bugs in certain system
> > > > configurations.
> > > >
> > > > Jason
> > >
> > > Hi Jason,
> > > Thanks for the feedback.
> > > Regarding point 1, why is that a problem if we disable the option to
> > > mmap the dma-buf from user-space ?
> >
> > Userspace has nothing to do with needing struct pages or not
> >
> > Point 1 and 2 mostly go together, you supporting the iommu is not nice
> > if you dont have struct pages.
> >
> > You should study Logan's patches I pointed you at as they are solving
> > exactly this problem.

> Yes, I do need to study them. I agree with you here. It appears I
> have a hole in my understanding. I'm missing the connection between
> iommu support (which I must have of course) and struct pages.

Chistian explained what the AMD driver is doing by calling
dma_map_resource().

Which is a hacky and slow way of achieving what Logan's series is
doing.

> > No, the design of the dmabuf requires the exporter to do the dma maps
> > and so it is only the exporter that is wrong to omit all the iommu and
> > p2p logic.
> >
> > RDMA is OK today only because nobody has implemented dma buf support
> > in rxe/si - mainly because the only implementations of exporters don't
>
> Can you please educate me, what is rxe/si ?

Sorry, rxe/siw - these are the all-software implementations of RDMA
and they require the struct page to do a SW memory copy. They can't
implement dmabuf without it.

> ok...
> so how come that patch-set was merged into 5.12 if it's buggy ?

We only implemented true dma devices for RDMA DMABUF support, so it is
isn't buggy right now.

> Yes, that's what I expect to see. But I want to see it with my own
> eyes and then figure out how to solve this.

It might be tricky to test because you have to ensure the iommu is
turned on and has a non-idenity page table. Basically if it doesn't
trigger a IOMMU failure then the IOMMU isn't setup properly.

Jason