Re: [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely
From: Philipp Stanner
Date: Mon Jun 15 2026 - 06:36:30 EST
On Mon, 2026-06-15 at 12:09 +0200, Christian König wrote:
> On 6/15/26 12:04, Philipp Stanner wrote:
> > On Mon, 2026-06-15 at 11:53 +0200, Christian König wrote:
> > > On 6/12/26 12:42, Philipp Stanner wrote:
> > > > dma_fence_is_signaled() returns whether a fence has been signaled
> > > > already. That function contains a fast path opportunistic check which is
> > > > not guarded by the lock and, according to Christian, cannot be guarded
> > > > by the lock without causing a massive performance regression.
> > > >
> > > > This now means that dma_fence_is_signaled() can return true WHILE the
> > > > fence callbacks are still being executed. This is razy and has lead to
> > > > at least one bug solved in:
> > > >
> > > > commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")
> > > >
> > > > Make this race impossible, by simply setting the bit only once the
> > > > callbacks are actually completed.
> > >
> > > Groundhog day, that has been suggested before and it simply doesn't work.
> > >
> > > The flag is intentional set before calling the callbacks because the state needs to be visible.
> >
> > It will be visible. Just later.
>
> It must be visible *before* the callbacks are called. The whole idea with the callbacks is that you can install a notification of state change.
>
> > > Just see dma_fence_default_wait() for an example why that approach doesn't work.
> >
> > What's the issue? It will be set. Just later. Who is ordering with
> > whom?
>
> See the functions dma_fence_default_wait() and dma_fence_default_wait_cb().
>
> It wakes up the sleeping thread which in turn needs to observes the new state.
dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
{
struct default_wait_cb cb;
unsigned long flags;
signed long ret = timeout ? timeout : 1;
dma_fence_lock_irqsave(fence, flags); // <------- cool, a lock! ^_^
if (dma_fence_test_signaled_flag(fence))
goto out;
if (intr && signal_pending(current)) {
ret = -ERESTARTSYS;
goto out;
}
if (!timeout) {
ret = 0;
goto out;
}
cb.base.func = dma_fence_default_wait_cb;
cb.task = current;
list_add(&cb.base.node, &fence->cb_list); // <--------------- guarded by lock
while (!dma_fence_test_signaled_flag(fence) && ret > 0) { // <-------- flag-check is guarded by lock. Fully ordered. Doesn't matter where
if (intr) dma_fence_signal_timeout_locked() sets the bit.
__set_current_state(TASK_INTERRUPTIBLE);
else
__set_current_state(TASK_UNINTERRUPTIBLE);
dma_fence_unlock_irqrestore(fence, flags); // <------------ dma_fence_signal_timeout_locked() can change the flag only after here
ret = schedule_timeout(ret);
dma_fence_lock_irqsave(fence, flags);
if (ret > 0 && intr && signal_pending(current))
ret = -ERESTARTSYS;
}
P.
>
> Regards,
> Christian.
>
> > I BTW suggest to write more code comments in the future to document all
> > these supposed pitfalls for those who will hack on that code base once
> > we have left.
> >
> >
> > P.
> >
> > >
> > > Regards,
> > > Christian.
> > >
> > > >
> > > > Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
> > > > ---
> > > > drivers/dma-buf/dma-fence.c | 18 ++++++++++++++++--
> > > > 1 file changed, 16 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > index c7ea1e75d38a..2416cc86ce93 100644
> > > > --- a/drivers/dma-buf/dma-fence.c
> > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > @@ -359,8 +359,19 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> > > >
> > > > dma_fence_assert_held(fence);
> > > >
> > > > - if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > > > - &fence->flags)))
> > > > + /*
> > > > + * First test the bit, so we don't signal an already signaled fence again.
> > > > + * The lock protects against multiple parties setting the bit. The bit
> > > > + * is then set at the end of the function.
> > > > + *
> > > > + * The background is that there is a fast path check in
> > > > + * dma_fence_is_signaled() which does not use lock protection and can
> > > > + * return true *while* the fence callbacks are still executing.
> > > > + *
> > > > + * This fast path check supposedly cannot be guarded by the lock because
> > > > + * of significant performance regressions.
> > > > + */
> > > > + if (unlikely(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
> > > > return;
> > > >
> > > > trace_dma_fence_signaled(fence);
> > > > @@ -384,6 +395,9 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> > > > INIT_LIST_HEAD(&cur->node);
> > > > cur->func(fence, cur);
> > > > }
> > > > +
> > > > + // TODO: we need some barrier here, don't we?
> > > > + set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
> > > > }
> > > > EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
> > > >