Re: [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely
From: Christian König
Date: Mon Jun 15 2026 - 06:10:10 EST
On 6/15/26 12:04, Philipp Stanner wrote:
> On Mon, 2026-06-15 at 11:53 +0200, Christian König wrote:
>> On 6/12/26 12:42, Philipp Stanner wrote:
>>> dma_fence_is_signaled() returns whether a fence has been signaled
>>> already. That function contains a fast path opportunistic check which is
>>> not guarded by the lock and, according to Christian, cannot be guarded
>>> by the lock without causing a massive performance regression.
>>>
>>> This now means that dma_fence_is_signaled() can return true WHILE the
>>> fence callbacks are still being executed. This is razy and has lead to
>>> at least one bug solved in:
>>>
>>> commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")
>>>
>>> Make this race impossible, by simply setting the bit only once the
>>> callbacks are actually completed.
>>
>> Groundhog day, that has been suggested before and it simply doesn't work.
>>
>> The flag is intentional set before calling the callbacks because the state needs to be visible.
>
> It will be visible. Just later.
It must be visible *before* the callbacks are called. The whole idea with the callbacks is that you can install a notification of state change.
>> Just see dma_fence_default_wait() for an example why that approach doesn't work.
>
> What's the issue? It will be set. Just later. Who is ordering with
> whom?
See the functions dma_fence_default_wait() and dma_fence_default_wait_cb().
It wakes up the sleeping thread which in turn needs to observes the new state.
Regards,
Christian.
> I BTW suggest to write more code comments in the future to document all
> these supposed pitfalls for those who will hack on that code base once
> we have left.
>
>
> P.
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
>>> ---
>>> drivers/dma-buf/dma-fence.c | 18 ++++++++++++++++--
>>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index c7ea1e75d38a..2416cc86ce93 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -359,8 +359,19 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>
>>> dma_fence_assert_held(fence);
>>>
>>> - if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>>> - &fence->flags)))
>>> + /*
>>> + * First test the bit, so we don't signal an already signaled fence again.
>>> + * The lock protects against multiple parties setting the bit. The bit
>>> + * is then set at the end of the function.
>>> + *
>>> + * The background is that there is a fast path check in
>>> + * dma_fence_is_signaled() which does not use lock protection and can
>>> + * return true *while* the fence callbacks are still executing.
>>> + *
>>> + * This fast path check supposedly cannot be guarded by the lock because
>>> + * of significant performance regressions.
>>> + */
>>> + if (unlikely(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
>>> return;
>>>
>>> trace_dma_fence_signaled(fence);
>>> @@ -384,6 +395,9 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>> INIT_LIST_HEAD(&cur->node);
>>> cur->func(fence, cur);
>>> }
>>> +
>>> + // TODO: we need some barrier here, don't we?
>>> + set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
>>> }
>>> EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
>>>