Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

From: Felix Kuehling
Date: Tue Jun 22 2021 - 12:50:17 EST


Am 2021-06-22 um 11:29 a.m. schrieb Christian König:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
>> On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
>>> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
>>>> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>>>>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>>>>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote:
>>>>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>>>>
>>>>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>>>>> mmap the dma-buf as we do not support direct IO. So there is no
>>>>>>>> access
>>>>>>>> to these pages through the userspace.
>>>>>>> Arguably mmaping the memory is a better choice, and is the
>>>>>>> direction
>>>>>>> that Logan's series goes in. Here the use of DMABUF was
>>>>>>> specifically
>>>>>>> designed to allow hitless revokation of the memory, which this
>>>>>>> isn't
>>>>>>> even using.
>>>>>> The major problem with this approach is that DMA-buf is also used
>>>>>> for
>>>>>> memory which isn't CPU accessible.
>>>> That isn't an issue here because the memory is only intended to be
>>>> used with P2P transfers so it must be CPU accessible.
>>> No, especially P2P is often done on memory resources which are not even
>>> remotely CPU accessible.
>> That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
>> CPU accessible.
>
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with
> the CPU.
>
>>>>>>> So you are taking the hit of very limited hardware support and
>>>>>>> reduced
>>>>>>> performance just to squeeze into DMABUF..
>>>> You still have the issue that this patch is doing all of this P2P
>>>> stuff wrong - following the already NAK'd AMD approach.
>>> Well that stuff was NAKed because we still use sg_tables, not
>>> because we
>>> don't want to allocate struct pages.
>> sg lists in general.
>>  
>>> The plan is to push this forward since DEVICE_PRIVATE clearly can't
>>> handle
>>> all of our use cases and is not really a good fit to be honest.
>>>
>>> IOMMU is now working as well, so as far as I can see we are all good
>>> here.
>> How? Is that more AMD special stuff?
>
> No, just using the dma_map_resource() interface.
>
> We have that working on tons of IOMMU enabled systems.
>
>> This patch series never calls to the iommu driver, AFAICT.
>>
>>>>> I'll go and read Logan's patch-set to see if that will work for us in
>>>>> the future. Please remember, as Daniel said, we don't have struct
>>>>> page
>>>>> backing our device memory, so if that is a requirement to connect to
>>>>> Logan's work, then I don't think we will want to do it at this point.
>>>> It is trivial to get the struct page for a PCI BAR.
>>> Yeah, but it doesn't make much sense. Why should we create a struct
>>> page for
>>> something that isn't even memory in a lot of cases?
>> Because the iommu and other places need this handle to setup their
>> stuff. Nobody has yet been brave enough to try to change those flows
>> to be able to use a physical CPU address.
>
> Well that is certainly not true. I'm just not sure if that works with
> all IOMMU drivers thought.
>
> Would need to ping Felix when the support for this was merged.

We have been working on IOMMU support for all our multi-GPU memory
mappings in KFD. The PCIe P2P side of this is currently only merged on
our internal branch. Before we can actually use this, we need
CONFIG_DMABUF_MOVE_NOTIFY enabled (which is still documented as
experimental and disabled by default). Otherwise we'll end up pinning
all our VRAM.

I think we'll try to put together an upstream patch series of all our
PCIe P2P support in a few weeks or so. This will include IOMMU mappings,
checking that PCIe P2P is actually possible between two devices, and KFD
topology updates to correctly report those capabilities to user mode.

It will not use struct pages for exported VRAM buffers.

Regards,
  Felix


>
> Regards,
> Christian.
>
>>
>> This is why we have a special struct page type just for PCI BAR
>> memory.
>>
>> Jason
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx