Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

From: Christian König
Date: Wed Jun 23 2021 - 04:57:43 EST


Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
[SNIP]
No absolutely not. NVidia GPUs work exactly the same way.

And you have tons of similar cases in embedded and SoC systems where
intermediate memory between devices isn't directly addressable with the CPU.
None of that is PCI P2P.

It is all some specialty direct transfer.

You can't reasonably call dma_map_resource() on non CPU mapped memory
for instance, what address would you pass?

Do not confuse "I am doing transfers between two HW blocks" with PCI
Peer to Peer DMA transfers - the latter is a very narrow subcase.

No, just using the dma_map_resource() interface.
Ik, but yes that does "work". Logan's series is better.
No it isn't. It makes devices depend on allocating struct pages for their
BARs which is not necessary nor desired.
Which dramatically reduces the cost of establishing DMA mappings, a
loop of dma_map_resource() is very expensive.

Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of at least 2MiB or only a single 4KiB page.

Oded might run into more performance problems, but those DMA-buf mappings are usually set up only once.

How do you prevent direct I/O on those pages for example?
GUP fails.

At least that is calming.

Allocating a struct pages has their use case, for example for exposing VRAM
as memory for HMM. But that is something very specific and should not limit
PCIe P2P DMA in general.
Sure, but that is an ideal we are far from obtaining, and nobody wants
to work on it prefering to do hacky hacky like this.

If you believe in this then remove the scatter list from dmabuf, add a
new set of dma_map* APIs to work on physical addresses and all the
other stuff needed.

Yeah, that's what I totally agree on. And I actually hoped that the new P2P work for PCIe would go into that direction, but that didn't materialized.

But allocating struct pages for PCIe BARs which are essentially registers and not memory is much more hacky than the dma_resource_map() approach.

To re-iterate why I think that having struct pages for those BARs is a bad idea: Our doorbells on AMD GPUs are write and read pointers for ring buffers.

When you write to the BAR you essentially tell the firmware that you have either filled the ring buffer or read a bunch of it. This in turn then triggers an interrupt in the hardware/firmware which was eventually asleep.

By using PCIe P2P we want to avoid the round trip to the CPU when one device has filled the ring buffer and another device must be woken up to process it.

Think of it as MSI-X in reverse and allocating struct pages for those BARs just to work around the shortcomings of the DMA API makes no sense at all to me.


We also do have the VRAM BAR, and for HMM we do allocate struct pages for the address range exposed there. But this is a different use case.

Regards,
Christian.


Otherwise, we have what we have and drivers don't get to opt out. This
is why the stuff in AMDGPU was NAK'd.

Jason