Re: Properly synchronize dma_fence->signaled bit (Was: Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking)
From: Gary Guo
Date: Wed Jun 17 2026 - 09:52:13 EST
On Wed Jun 17, 2026 at 10:46 AM BST, Christian König wrote:
> On 6/16/26 13:25, Philipp Stanner wrote:
>>
>> I think rejecting ideas with "we tried this, it >>didn't work<<" is not
>> a valid reason for refusing an idea. Point A above helps with that. If
>> your commit message contains measurements or links to tickets with
>> *real life* performance regressions (microbenchmarks are invalid), that
>> helps reducing discussion overhead drastically.
>>
>> Now, in this particular case, I fail to see how taking the spinlock to
>> check that bit is evil. If it regresses someone's speed that much, it
>> would mean that someone is heavily punching that lock, like polling
>> 24/7 with dma_fence_is_signaled().
>
> I think (but I'm not 100% sure) the the problem is that taking the spinlock
> introduces a write to the cache line it is in.
>
> At the moment when a fence is signaled a read is enough to check that state,
> so what happens is that the cache line for the signaled bit sooner or later
> end up in all CPU caches.
>
> When you start to use the spinlock the cache line backing that plays ping/pong
> between all the CPU cores and that is something which always stalls each CPU
> when it needs to acquire the cache line. Keep in mind that on a modern box you
> can calculate like a 4x4 matrix in the same time you solve a cache miss.
>
> This is especially important for the stub fence which is used by basically all
> cores at the same time whenever you need a signaled dummy.
>
This sounds like an area where hazard pointers can help. Like RCU the reader
side is lock-free. And for the specific case of signaled state where it is only
going one direction, hazard pointer is also wait-free because it does not need
to loop until the state is stable.
The reclaim side just needs to wait for all reader to exit their critical
section, unless RCU where it needs to wait for a full grace period.
That said, the reclaim waiter still must not hold any locks (or other resources)
that the reader side critical section can take. So you still got a variant of
the lock inversion problem to avoid.
Best,
Gary