Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

From: Oded Gabbay
Date: Tue Jun 22 2021 - 11:25:04 EST


On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
>
> > > 1) Setting sg_page to NULL
> > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > > can work at all in the first place.
> > >
> > > All of these result in functional bugs in certain system
> > > configurations.
> > >
> > > Jason
> >
> > Hi Jason,
> > Thanks for the feedback.
> > Regarding point 1, why is that a problem if we disable the option to
> > mmap the dma-buf from user-space ?
>
> Userspace has nothing to do with needing struct pages or not
>
> Point 1 and 2 mostly go together, you supporting the iommu is not nice
> if you dont have struct pages.
>
> You should study Logan's patches I pointed you at as they are solving
> exactly this problem.
Yes, I do need to study them. I agree with you here. It appears I have
a hole in my understanding.
I'm missing the connection between iommu support (which I must have of
course) and struct pages.

>
> > In addition, I didn't see any problem with sg_page being NULL in the
> > RDMA p2p dma-buf code. Did I miss something here ?
>
> No, the design of the dmabuf requires the exporter to do the dma maps
> and so it is only the exporter that is wrong to omit all the iommu and
> p2p logic.
>
> RDMA is OK today only because nobody has implemented dma buf support
> in rxe/si - mainly because the only implementations of exporters don't
Can you please educate me, what is rxe/si ?

> set the struct page and are thus buggy.

ok...
so how come that patch-set was merged into 5.12 if it's buggy ?
Because the current exporters are buggy ? I probably need a history
lesson here.
But I understand why you think it's a bad idea to add a new buggy exporter.

>
> > I will take two GAUDI devices and use one as an exporter and one as an
> > importer. I want to see that the solution works end-to-end, with real
> > device DMA from importer to exporter.
>
> I can tell you it doesn't. Stuffing physical addresses directly into
> the sg list doesn't involve any of the IOMMU code so any configuration
> that requires IOMMU page table setup will not work.
>
> Jason

Yes, that's what I expect to see. But I want to see it with my own
eyes and then figure out how to solve this.
Maybe the result will be going to Logan's path, maybe something else,
but I need to start by seeing the failure in a real system.

Thanks for the information, it is really helpful.

Oded