Re: [PATCH] RFC: dma-buf: Add an API for importing and exporting sync files

From: Jason Ekstrand
Date: Tue Mar 03 2020 - 14:10:35 EST


On Thu, Feb 27, 2020 at 2:28 AM Christian KÃnig
<christian.koenig@xxxxxxx> wrote:
>
> Am 26.02.20 um 17:46 schrieb Bas Nieuwenhuizen:
> > On Wed, Feb 26, 2020 at 4:29 PM Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote:
> >> On Wed, Feb 26, 2020 at 4:05 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
> >>> On Wed, Feb 26, 2020 at 10:16:05AM +0100, Christian KÃnig wrote:
> >>> [SNIP]
> >>>> Just imagine that you access some DMA-buf with a shader and that operation
> >>>> is presented as a fence on the DMA-bufs reservation object. And now you can
> >>>> go ahead and replace that fence and free up the memory.
> >>>>
> >>>> Tricking the Linux kernel into allocating page tables in that freed memory
> >>>> is trivial and that's basically it you can overwrite page tables with your
> >>>> shader and gain access to all of system memory :)
> >>>>
> >>>> What we could do is to always make sure that the added fences will complete
> >>>> later than the already existing ones, but that is also rather tricky to get
> >>>> right. I wouldn't do that if we don't have a rather big use case for this.
> >> Right. I thought about that but I'm still learning how dma_resv
> >> works. It'd be easy enough to make a fence array that contains both
> >> the old fence and the new fence and replace the old fence with that.
> >> What I don't know is the proper way to replace the exclusive fence
> >> safely. Some sort of atomic_cpxchg loop, perhaps? I presume there's
> >> some way of doing it properly because DRM drivers are doing it all the
> >> time.
>
> First of all you need to grab the lock of the dma_resv object or you
> can't replace the exclusive nor the shared ones.
>
> This way you don't need to do a atomic_cmpxchg or anything else and
> still guarantee correct ordering.

Fixed in v3.

> > I think for an exclusive fence you may need to create a fence array
> > that includes the existing exclusive and shared fences in the dma_resv
> > combined with the added fence.
>
> Yes, that at least gives us the correct synchronization.

Fixed in v2

> > However, I'm not sure what the best way is to do garbage collection on
> > that so that we don't get an impossibly list of fence arrays.
>
> Exactly yes. That's also the reason why the dma_fence_chain container I
> came up with for the sync timeline stuff has such a rather sophisticated
> garbage collection.
>
> When some of the included fences signal you need to free up the
> array/chain and make sure that the memory for the container can be reused.

Currently (as of v2), I'm using dma_fence_array and being careful to
not bother constructing one if there's only one fence in play. Is
this insufficient? If so, maybe we should consider improving
dma_fence_array.

> > (Note
> > the dma_resv has a lock that needs to be taken before adding an
> > exclusive fence, might be useful). Some code that does a thing like
> > this is __dma_resv_make_exclusive in
> > drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>
> Wanted to move that into dma_resv.c for quite a while since there are
> quite a few other cases where we need this.

I've roughly done that. The primary difference is that my version
takes an optional additional fence to add to the array. This makes it
a bit more complicated but I think I got it mostly right.

I've also written userspace code which exercises this and it seems to
work. Hopefully, that will give a better idea of what I'm trying to
accomplish.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037

--Jason