Re: Properly synchronize dma_fence->signaled bit (Was: Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking)

From: Philipp Stanner

Date: Wed Jun 17 2026 - 09:22:28 EST

On Wed, 2026-06-17 at 15:03 +0200, Christian König wrote:
> On 6/17/26 12:16, Philipp Stanner wrote:
> > On Wed, 2026-06-17 at 11:46 +0200, Christian König wrote:
> ...
> > >
> > > > B.
> > > >
> > > > I think rejecting ideas with "we tried this, it >>didn't work<<" is not
> > > > a valid reason for refusing an idea. Point A above helps with that. If
> > > > your commit message contains measurements or links to tickets with
> > > > *real life* performance regressions (microbenchmarks are invalid), that
> > > > helps reducing discussion overhead drastically.
> > > >
> > > > Now, in this particular case, I fail to see how taking the spinlock to
> > > > check that bit is evil. If it regresses someone's speed that much, it
> > > > would mean that someone is heavily punching that lock, like polling
> > > > 24/7 with dma_fence_is_signaled().
> > >
> > > I think (but I'm not 100% sure) the the problem is that taking the
> > > spinlock introduces a write to the cache line it is in.
> > >
> > > At the moment when a fence is signaled a read is enough to check that
> > > state, so what happens is that the cache line for the signaled bit
> > > sooner or later end up in all CPU caches.
> > >
> > > When you start to use the spinlock the cache line backing that plays
> > > ping/pong between all the CPU cores and that is something which
> > > always stalls each CPU when it needs to acquire the cache line. Keep
> > > in mind that on a modern box you can calculate like a 4x4 matrix in
> > > the same time you solve a cache miss.
> > >
> > > This is especially important for the stub fence which is used by
> > > basically all cores at the same time whenever you need a signaled
> > > dummy.
> >
> > Alright, that sort of sounds logical, I guess. So the argument
> > basically is that if we'd try to lock that, someone would immediately
> > report real and massive performance regressions leading to a revert.
> >
> > I think last time you mentioned that memory footprint is less of a
> > concern for dma_fence than cache lines. Out of interest: has anyone
> > ever experimented with more padding to prevent spinners from shooting
> > down other CPUs cache lines?
>
> How would that work in this case? I mean as long as you have the same
> variable (spinlock) you have the same cache line no matter how you
> pad.

Ah, gotcha. I was talking more in general. Sometimes you have
situations like:

struct foo {
spinlock_t lock;
// place padding to fill up a cache line here?
u8 data[];
} bar;

// thread A
lock(bar->lock);
do_sth(bar->data); // works on `data`
unlock(bar->lock);

// thread B
lock(bar->lock); // might invalidate the cache line the beginning of `data` lives in

That wouldn't solve the spinlock-issue; but I've been interested in a
while in whether the above has been an observed problem.

>
> > Since you're the maintainer of dma-buf, what would you wish we do?
>
> Try to improve the documentation by sending out patches. I will send out my ideas for resilient improvements and we then discuss on the patches.
>
> > Would you be at least OK with the memory barrier approach to make the
> > API a bit more robust? AFAIU the barriers will not cause a cache line
> > invalidation.
>
> What exactly do you mean with that? The test_bit() and set_bit() are
> already memory barriers as far as I know.

I'm talking about whether we could enforce that the bit is only set
once the callbacks have completed, so someone who wants to drop his
reference once dma_fence_is_signaled() returns true doesn't cause a
UAF. You didn't answer here:

https://lore.kernel.org/dri-devel/dca171cea556c3f3de3a86f735eeb53335cd3f49.camel@xxxxxxxxxxx/

>
> > >
> > > > Again, having that use case documented somewhere could save us all time
> > > > – especially for you, Christian, since you wouldn't be forced to have
> > > > the same discussion over and over again over the years ;-)
> > >
> > > Well I could also send out all the DMA-buf resilient patches/ideas I came up with over the years once more.
> >
> > Maybe we could have sort of a wiki in Documentation/ with links to
> > relevant mail threads and some explanations of why things are the way
> > they are?
>
> I think some AI analyzing the mailing list and noting when some ideas
> repeat would help.

Such tools should be used sparingly. Notably they can contribute to
contributor-frustration.

>
> At least for me maintaining some kind of Wiki additional to my
> current workload wouldn't be possible at all.

A simple file with some links could be enough. But was just an idea, no
hard feelings about it.

P.