Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking
From: Christian König
Date: Mon Jun 08 2026 - 14:38:45 EST
On 6/8/26 19:59, Danilo Krummrich wrote:
> On Mon Jun 8, 2026 at 7:34 PM CEST, Christian König wrote:
>> That's why we need the RCU grace period to make sure that nobody is
>> referencing the driver stuff any more.
>
> Right, and that's what Philipp tries to address, the requirement to wait for an
> RCU grace period is perfectly fine if it is only about freeing memory, but it
> can become painful if the fence private data contains data also needs to be
> destructed in some way.
Yeah that makes sense.
> IOW, if a driver signals a fence, it is lifecycle-wise reasonable to destruct
> the private data that is no longer needed (remaining users only deal with struct
> dma_fence) and having to wait for a full grace period adds sublety and
> complication that can be avoided with the proposed approach.
Yeah, I've run into that when I tried to make the amdgpu fences independent as well.
> That said, I'd like to ask the opposite question: What are the concerns with the
> proposed approach over (pure) RCU?
Well a) locking inversions and b) performance.
For example the reason why we have the dma_fence_is_signaled() and dma_fence_is_signaled_locked() variants is because there is a measurable difference in some specific use cases for not grabbing the locks.
I personally find those micro-optimizations rather questionable, but the community agreement is that we should have them.
So my take would rather be that the dma_fence_is_signaled_locked() variant goes away and we consistently call the ops pointers without holding the dma_fence lock and the driver implementations can then optionally take it if necessary.
I think for this we would just need to replace most calls to dma_fence_is_signaled_locked() with dma_fence_test_signaled().
In the long term that would also allow cleaning up the container handling and simplifying the DRM scheduler a bit.
Regards,
Christian.