Re: [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely
From: Philipp Stanner
Date: Mon Jun 15 2026 - 06:05:25 EST
On Mon, 2026-06-15 at 11:53 +0200, Christian König wrote:
> On 6/12/26 12:42, Philipp Stanner wrote:
> > dma_fence_is_signaled() returns whether a fence has been signaled
> > already. That function contains a fast path opportunistic check which is
> > not guarded by the lock and, according to Christian, cannot be guarded
> > by the lock without causing a massive performance regression.
> >
> > This now means that dma_fence_is_signaled() can return true WHILE the
> > fence callbacks are still being executed. This is razy and has lead to
> > at least one bug solved in:
> >
> > commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")
> >
> > Make this race impossible, by simply setting the bit only once the
> > callbacks are actually completed.
>
> Groundhog day, that has been suggested before and it simply doesn't work.
>
> The flag is intentional set before calling the callbacks because the state needs to be visible.
It will be visible. Just later.
>
> Just see dma_fence_default_wait() for an example why that approach doesn't work.
What's the issue? It will be set. Just later. Who is ordering with
whom?
I BTW suggest to write more code comments in the future to document all
these supposed pitfalls for those who will hack on that code base once
we have left.
P.
>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
> > ---
> > drivers/dma-buf/dma-fence.c | 18 ++++++++++++++++--
> > 1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index c7ea1e75d38a..2416cc86ce93 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -359,8 +359,19 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> >
> > dma_fence_assert_held(fence);
> >
> > - if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > - &fence->flags)))
> > + /*
> > + * First test the bit, so we don't signal an already signaled fence again.
> > + * The lock protects against multiple parties setting the bit. The bit
> > + * is then set at the end of the function.
> > + *
> > + * The background is that there is a fast path check in
> > + * dma_fence_is_signaled() which does not use lock protection and can
> > + * return true *while* the fence callbacks are still executing.
> > + *
> > + * This fast path check supposedly cannot be guarded by the lock because
> > + * of significant performance regressions.
> > + */
> > + if (unlikely(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
> > return;
> >
> > trace_dma_fence_signaled(fence);
> > @@ -384,6 +395,9 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> > INIT_LIST_HEAD(&cur->node);
> > cur->func(fence, cur);
> > }
> > +
> > + // TODO: we need some barrier here, don't we?
> > + set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
> > }
> > EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
> >