Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking
From: Philipp Stanner
Date: Mon Jun 08 2026 - 12:14:34 EST
On Mon, 2026-06-08 at 17:23 +0200, Danilo Krummrich wrote:
> On Mon Jun 8, 2026 at 5:17 PM CEST, Philipp Stanner wrote:
> > On Mon, 2026-06-08 at 17:01 +0200, Boris Brezillon wrote:
> > > On Mon, 8 Jun 2026 16:24:37 +0200
> > > Philipp Stanner <phasta@xxxxxxxxxx> wrote:
> > >
> > > > @@ -1020,11 +1024,20 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> > > > {
> > > > const struct dma_fence_ops *ops;
> > > > + unsigned long flags;
> > > >
> > > > rcu_read_lock();
> > > > ops = rcu_dereference(fence->ops);
> > > > - if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
> > > > + if (!ops || !ops->set_deadline) {
> > > > + rcu_read_unlock();
> > > > + return;
> > > > + }
> > > > +
> > > > + dma_fence_lock_irqsave(fence, flags);
> > > > + if (!dma_fence_is_signaled_locked(fence))
> > > > ops->set_deadline(fence, deadline);
> > >
> > > You can't take the fence lock around ->set_deadline(), otherwise you'll
> > > deadlock here [1] or here [2].
> > >
> > > > +
> > > > + dma_fence_unlock_irqrestore(fence, flags);
> > > > rcu_read_unlock();
> > > > }
> > >
> > >
> > > [1]https://elixir.bootlin.com/linux/v7.0.11/source/drivers/dma-buf/sw_sync.c#L182
> > > [2]https://elixir.bootlin.com/linux/v7.0.11/source/drivers/gpu/drm/msm/msm_fence.c#L139
Oh, MSM actually doesn't btw, that's a false positive. That's a
distinct spinlock on their fence context object.
But yes, before we could upstream this, we would go through all the
implementors like Danilo did, to find all the others.
P.
> >
> >
> > If we'd port these (and maybe some we have overlooked) simultaneously,
> > they could completely drop their separate locking.
> >
> > The fact that other parties were forced to take the fence lock in their
> > callbacks (and even 100% of the functions' code) actually proves that
> > this RFC is probably a good idea and callback-calls should be guarded
> > by the fence lock :]
>
> I think I looked into this recently and IIRC it indeed seems like all
> implementors of set_deadline() take the lock within their callback.