Re: Properly synchronize dma_fence->signaled bit (Was: Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking)
From: Philipp Stanner
Date: Tue Jun 16 2026 - 07:27:56 EST
On Mon, 2026-06-15 at 11:57 +0200, Christian König wrote:
> On 6/15/26 10:29, Philipp Stanner wrote:
> >
> > This fast path check in my mind certainly breaks the intended dma_fence
> > design:
> >
> > void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> > ktime_t timestamp)
> > {
> > const struct dma_fence_ops *ops;
> > struct dma_fence_cb *cur, *tmp;
> > struct list_head cb_list;
> >
> > dma_fence_assert_held(fence);
> >
> > if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > &fence->flags)))
> > return;
> >
> >
> > Modifying the bit is consistently done under lock protection, so
> > reading must be done, too.
> >
> > Do you remember who wanted those fast path checks? Who is spinning on
> > that lock?
>
> Simona Vetter and basically the rest of the community.
I did some git-blame and it would seem to me that this fast-path hack
was added in
e941759c74a4 fence: dma-buf cross-device synchronization (v18)
in 2014 A.D.
So it was there from the very beginning and was not added because there
was a performance bottleneck later. It is conceivable that a
performance issue was present from the get go, of course.
>
> And I can clearly say even if I don't like them that those
> optimizations are a must have.
>
> > In any case, that needs to be repaired.
>
> No, see my discussion with Simona on the mailing list. I need to dig
> that up as well, but it was around the time I added the same
> workaround to amdgpu.
>
> You are basically trying what I have been suggesting as well, but
> there is a very wide agreement that the current design is a must
> have.
I suggest at least three things:
A.
Very explicitly document all lockless mechanisms and their
justification in both code comments and commit messages.
* We need to document lockless magic *drastically* better in DRM. I
see code left and right where there is some barrier with the comment
simply being "so list_empty() works without a lock".
* The commit message needs to justify why a lock is missing, why this
is the preferred solution, why it is correct. The latter also needs
to be in a code comment.
* Note that WRITE_ONCE() is not only about volatile, but also about
"watch out, here is a lockless access!", as Linus pointed out
repeatedly.
B.
I think rejecting ideas with "we tried this, it >>didn't work<<" is not
a valid reason for refusing an idea. Point A above helps with that. If
your commit message contains measurements or links to tickets with
*real life* performance regressions (microbenchmarks are invalid), that
helps reducing discussion overhead drastically.
Now, in this particular case, I fail to see how taking the spinlock to
check that bit is evil. If it regresses someone's speed that much, it
would mean that someone is heavily punching that lock, like polling
24/7 with dma_fence_is_signaled().
Again, having that use case documented somewhere could save us all time
– especially for you, Christian, since you wouldn't be forced to have
the same discussion over and over again over the years ;-)
C.
Robustness and correctness always trump performance. They especially
trump microbenchmarks.
P.