Re: Properly synchronize dma_fence->signaled bit (Was: Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking)

From: Philipp Stanner

Date: Wed Jun 17 2026 - 06:29:44 EST


On Wed, 2026-06-17 at 11:46 +0200, Christian König wrote:
> On 6/16/26 13:25, Philipp Stanner wrote:
> >
> > So it was there from the very beginning and was not added because there
> > was a performance bottleneck later. It is conceivable that a
> > performance issue was present from the get go, of course.
>
> Not saying that it wasn't there from the beginning, but multiple
> people (including me) have tried to improve the situation and that
> was either immediately reverted or directly rejected.
>
> > >
> > > And I can clearly say even if I don't like them that those
> > > optimizations are a must have.
> > >
> > > > In any case, that needs to be repaired.
> > >
> > > No, see my discussion with Simona on the mailing list. I need to dig
> > > that up as well, but it was around the time I added the same
> > > workaround to amdgpu.
> > >
> > > You are basically trying what I have been suggesting as well, but
> > > there is a very wide agreement that the current design is a must
> > > have.
> >
> >
> >  * We need to document lockless magic *drastically* better in DRM. I
> >    see code left and right where there is some barrier with the comment
> >    simply being "so list_empty() works without a lock".
> >  * The commit message needs to justify why a lock is missing, why this
> >    is the preferred solution, why it is correct. The latter also needs
> >    to be in a code comment.
> >  * Note that WRITE_ONCE() is not only about volatile, but also about
> >    "watch out, here is a lockless access!", as Linus pointed out
> >    repeatedly.
>
> Completely agree. Question is who has time for that?

We / the maintainers of the respective systems need to heavily
encourage that :)

The good news btw is that many things are moving towards a good
direction in recent past, as far as I have seen

>
> > B.
> >
> > I think rejecting ideas with "we tried this, it >>didn't work<<" is not
> > a valid reason for refusing an idea. Point A above helps with that. If
> > your commit message contains measurements or links to tickets with
> > *real life* performance regressions (microbenchmarks are invalid), that
> > helps reducing discussion overhead drastically.
> >
> > Now, in this particular case, I fail to see how taking the spinlock to
> > check that bit is evil. If it regresses someone's speed that much, it
> > would mean that someone is heavily punching that lock, like polling
> > 24/7 with dma_fence_is_signaled().
>
> I think (but I'm not 100% sure) the the problem is that taking the
> spinlock introduces a write to the cache line it is in.
>
> At the moment when a fence is signaled a read is enough to check that
> state, so what happens is that the cache line for the signaled bit
> sooner or later end up in all CPU caches.
>
> When you start to use the spinlock the cache line backing that plays
> ping/pong between all the CPU cores and that is something which
> always stalls each CPU when it needs to acquire the cache line. Keep
> in mind that on a modern box you can calculate like a 4x4 matrix in
> the same time you solve a cache miss.
>
> This is especially important for the stub fence which is used by
> basically all cores at the same time whenever you need a signaled
> dummy.

Alright, that sort of sounds logical, I guess. So the argument
basically is that if we'd try to lock that, someone would immediately
report real and massive performance regressions leading to a revert.

I think last time you mentioned that memory footprint is less of a
concern for dma_fence than cache lines. Out of interest: has anyone
ever experimented with more padding to prevent spinners from shooting
down other CPUs cache lines?

Since you're the maintainer of dma-buf, what would you wish we do?

Would you be at least OK with the memory barrier approach to make the
API a bit more robust? AFAIU the barriers will not cause a cache line
invalidation.


>
> > Again, having that use case documented somewhere could save us all time
> > – especially for you, Christian, since you wouldn't be forced to have
> > the same discussion over and over again over the years ;-)
>
> Well I could also send out all the DMA-buf resilient patches/ideas I came up with over the years once more.

Maybe we could have sort of a wiki in Documentation/ with links to
relevant mail threads and some explanations of why things are the way
they are?

btw, is there a dma-buf TODO list like for DRM in general?

There are many passionate hackers who love challenges. We could
certainly add a few "Difficulty: hard" entries for a few controversial
potential reworks.


P.