Re: [PATCH 1/4] dma-buf: Check status of enable-signaling bit on debug

From: Christian König
Date: Mon Sep 05 2022 - 14:26:56 EST


Am 05.09.22 um 18:39 schrieb Tvrtko Ursulin:

On 05/09/2022 12:21, Christian König wrote:
Am 05.09.22 um 12:56 schrieb Arvind Yadav:
The core DMA-buf framework needs to enable signaling
before the fence is signaled. The core DMA-buf framework
can forget to enable signaling before the fence is signaled.
To avoid this scenario on the debug kernel, check the
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
the signaling bit status to confirm that enable_signaling
is enabled.

You might want to put this patch at the end of the series to avoid breaking the kernel in between.


Signed-off-by: Arvind Yadav <Arvind.Yadav@xxxxxxx>
---
  include/linux/dma-fence.h | 5 +++++
  1 file changed, 5 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..60c0e935c0b5 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
  static inline bool
  dma_fence_is_signaled(struct dma_fence *fence)
  {
+#ifdef CONFIG_DEBUG_FS

CONFIG_DEBUG_FS is certainly wrong, probably better to check for CONFIG_DEBUG_WW_MUTEX_SLOWPATH here.

Apart from that looks good to me,

What's the full story in this series - I'm afraid the cover letter does not make it clear to a casual reader like myself? Where does the difference between debug and non debug kernel come from?

We have a bug that the drm_sync file doesn't properly enable signaling leading to an igt test failure.


And how do the proposed changes relate to the following kerneldoc excerpt:

     * Since many implementations can call dma_fence_signal() even when before
     * @enable_signaling has been called there's a race window, where the
     * dma_fence_signal() might result in the final fence reference being
     * released and its memory freed. To avoid this, implementations of this
     * callback should grab their own reference using dma_fence_get(), to be
     * released when the fence is signalled (through e.g. the interrupt
     * handler).
     *
     * This callback is optional. If this callback is not present, then the
     * driver must always have signaling enabled.

Is it now an error, or should be impossible condition, for "is signaled" to return true _unless_ signaling has been enabled?

That's neither an error nor impossible. For debugging we just never return signaled from the dma_fence_is_signaled() function when signaling was not enabled before.

I also plan to remove the return value from the enable_signaling callback. That was just not very well designed.


If the statement (in a later patch) is signalling should always be explicitly enabled by the callers of dma_fence_add_callback, then what about the existing call to __dma_fence_enable_signaling from dma_fence_add_callback?

Oh, good point. That sounds like we have some bug in the core dma_fence code as well.

Calls to dma_fence_add_callback() and dma_fence_wait() should enable signaling implicitly and don't need an extra call for that.

Only dma_fence_is_signaled() needs this explicit enabling of signaling through dma_fence_enable_sw_signaling().


Or if the rules are changing shouldn't kerneldoc be updated as part of the series?

I think the kerneldoc is just a bit misleading. The point is that when you need to call dma_fence_enable_sw_signaling() you must hold a reference to the fence object.

But that's true for all the dma_fence_* functions. The race described in the comment is just nonsense because you need to hold that reference anyway.

Regards,
Christian.


Regards,

Tvrtko

Christian.

+    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+        return false;
+#endif
+
      if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
          return true;