Re: [RFC] Make use of non-dynamic dmabuf in RDMA

From: Christian König
Date: Wed Sep 01 2021 - 07:25:20 EST

Am 01.09.21 um 13:20 schrieb Gal Pressman:
On 24/08/2021 20:32, Jason Gunthorpe wrote:
On Tue, Aug 24, 2021 at 10:27:23AM -0700, John Hubbard wrote:
On 8/24/21 2:32 AM, Christian König wrote:
Am 24.08.21 um 11:06 schrieb Gal Pressman:
On 23/08/2021 13:43, Christian König wrote:
Am 21.08.21 um 11:16 schrieb Gal Pressman:
On 20/08/2021 17:32, Jason Gunthorpe wrote:
On Fri, Aug 20, 2021 at 03:58:33PM +0300, Gal Pressman wrote:
IIUC, we're talking about three different exporter "types":
- Dynamic with move_notify (requires ODP)
- Dynamic with revoke_notify
- Static

Which changes do we need to make the third one work?
Basically none at all in the framework.

You just need to properly use the dma_buf_pin() function when you start using a
buffer (e.g. before you create an attachment) and the dma_buf_unpin() function
after you are done with the DMA-buf.
I replied to your previous mail, but I'll ask again.
Doesn't the pin operation migrate the memory to host memory?
Sorry missed your previous reply.

And yes at least for the amdgpu driver we migrate the memory to host
memory as soon as it is pinned and I would expect that other GPU drivers
do something similar.
Well...for many topologies, migrating to host memory will result in a
dramatically slower p2p setup. For that reason, some GPU drivers may
want to allow pinning of video memory in some situations.

Ideally, you've got modern ODP devices and you don't even need to pin.
But if not, and you still hope to do high performance p2p between a GPU
and a non-ODP Infiniband device, then you would need to leave the pinned
memory in vidmem.

So I think we don't want to rule out that behavior, right? Or is the
thinking more like, "you're lucky that this old non-ODP setup works at
all, and we'll make it work by routing through host/cpu memory, but it
will be slow"?
I think it depends on the user, if the user creates memory which is
permanently located on the GPU then it should be pinnable in this way
without force migration. But if the memory is inherently migratable
then it just cannot be pinned in the GPU at all as we can't
indefinately block migration from happening eg if the CPU touches it
later or something.
So are we OK with exporters implementing dma_buf_pin() without migrating the memory?

I think so, yes.

If so, do we still want a move_notify callback for non-dynamic importers? A noop?

Well we could make the move_notify callback optional, e.g. so that you get the new locking approach but still pin the buffers manually with dma_buf_pin().