Re: [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely

From: Philipp Stanner

Date: Mon Jun 15 2026 - 10:17:13 EST


On Mon, 2026-06-15 at 10:56 +0100, Gary Guo wrote:
> On Fri Jun 12, 2026 at 11:42 AM BST, Philipp Stanner wrote:
> > dma_fence_is_signaled() returns whether a fence has been signaled
> > already. That function contains a fast path opportunistic check which is
> > not guarded by the lock and, according to Christian, cannot be guarded
> > by the lock without causing a massive performance regression.
> >
> > This now means that dma_fence_is_signaled() can return true WHILE the
> > fence callbacks are still being executed. This is razy and has lead to
> > at least one bug solved in:
> >
> > commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")
> >
> > Make this race impossible, by simply setting the bit only once the
> > callbacks are actually completed.
> >
> > Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
> > ---
> >  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index c7ea1e75d38a..2416cc86ce93 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -359,8 +359,19 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> >  
> >   dma_fence_assert_held(fence);
> >  
> > - if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > -       &fence->flags)))
> > + /*
> > + * First test the bit, so we don't signal an already signaled fence again.
> > + * The lock protects against multiple parties setting the bit. The bit
> > + * is then set at the end of the function.
> > + *
> > + * The background is that there is a fast path check in
> > + * dma_fence_is_signaled() which does not use lock protection and can
> > + * return true *while* the fence callbacks are still executing.
> > + *
> > + * This fast path check supposedly cannot be guarded by the lock because
> > + * of significant performance regressions.
> > + */
> > + if (unlikely(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
> >   return;
> >  
> >   trace_dma_fence_signaled(fence);
> > @@ -384,6 +395,9 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> >   INIT_LIST_HEAD(&cur->node);
> >   cur->func(fence, cur);
> >   }
> > +
> > + // TODO: we need some barrier here, don't we?
>
> You want a release barrier here, which we don't have currently, but you can use
> `smp_mb__before_atomic`.
>
> However note that barriers must be paired, so in order for the added barrier to
> do anything, dma_fence_is_signaled lockless fast path would need to have an
> acquire barrier, which we also don't have, so you need a `smp_mb()` there after
> the flag test.

Thx for the explanations.

I tend to conclude that this solution is far too fragile and without
even further synchronization we couldn't solve that, because the fence-
>signaled is at other places evaluated locklessly

const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
{
const struct dma_fence_ops *ops;

/* RCU protection is required for safe access to returned string */
ops = rcu_dereference(fence->ops);
if (!dma_fence_test_signaled_flag(fence))
return (const char __rcu *)ops->get_driver_name(fence);
else
return (const char __rcu *)"signaled-timeline";
}
EXPORT_SYMBOL(dma_fence_timeline_name);

The signaled flag here is ordered in the reverse order.
dma_fence_signal_timestamp_locked() sets the ops pointer to NULL, so
you might dereference a NULL pointer there.


P.

>
> Best,
> Gary
>
> > + set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
> >  }
> >  EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
> >  
>