Re: [PATCH v2 0/4] drm/tests: Fix locking issues (kind of)
From: Simona Vetter
Date: Tue Feb 04 2025 - 06:00:47 EST
On Wed, Jan 29, 2025 at 03:21:52PM +0100, Maxime Ripard wrote:
> Hi,
>
> Here's another attempt at fixing the current locking issues with the
> HDMI kunit tests.
>
> The initial issue was reported by Dave here:
> https://lore.kernel.org/all/CAPM=9tzJ4-ERDxvuwrCyUPY0=+P44orhp1kLWVGL7MCfpQjMEQ@xxxxxxxxxxxxxx/
>
> After fixing it, there was still a lockdep warning for a circular
> dependency. This series is also fixing the issue.
So this looks like it's a kthread_exit, which yes is broken. You cannot
acquire a lock in one thread and release it in another thread, that does
not work for lockdep and therefore is forbidden for mutexes.
It's kinda allowed for semaphore, but that's why semaphores cannot be
automatically checked by lockdep. So yeah we cannot use such a deferred
action, it would need to be a deferred action that's run synchronously.
-Sima
>
> There's still an issue though. When running the tests, I get:
>
> KTAP version 1
> 1..1
> KTAP version 1
> # Subtest: drm_atomic_helper_connector_hdmi_check
> # module: drm_hdmi_state_helper_test
> 1..1
>
> ====================================
> WARNING: kunit_try_catch/25 still has locks held!
> 6.13.0-rc2-00410-gbd9d16533367 #18 Tainted: G N
> ------------------------------------
> 2 locks held by kunit_try_catch/25:
> #0: fff00000021586f0 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_kunit_helper_acquire_ctx_alloc+0x5c/0xf0
> #1: fff0000002158718 (crtc_ww_class_mutex){+.+.}-{0:0}, at: drm_kunit_helper_acquire_ctx_alloc+0x5c/0xf0
>
> stack backtrace:
> CPU: 0 UID: 0 PID: 25 Comm: kunit_try_catch Tainted: G N 6.13.0-rc2-00410-gbd9d16533367 #18
> Tainted: [N]=TEST
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> show_stack+0x18/0x30 (C)
> dump_stack_lvl+0x70/0x98
> dump_stack+0x18/0x24
> debug_check_no_locks_held+0x9c/0xa4
> do_exit+0x52c/0x970
> kthread_exit+0x28/0x30
> kthread+0xdc/0xf0
> ret_from_fork+0x10/0x20
> ok 1 drm_test_check_hdmi_funcs_reject_rate
> ok 1 drm_atomic_helper_connector_hdmi_check
>
> I believe it's due to the fact that drm_kunit_helper_acquire_ctx_alloc()
> will acquire the lock directly, but will release it as a deferred kunit
> action. It's not unsafe, as the lock is eventually released, but I don't
> really know what the best course of action is here:
>
> * Forget about the idea of a context tied to the lifetime of a test
> * Make lockdep know that it's ok, and we know what to do
>
> I've tried the latter, using lockdep_pin/unpin_lock, but that didn't fix
> the issue so I must have done something wrong.
>
> Let me know what you think,
> Maxime
>
> Signed-off-by: Maxime Ripard <mripard@xxxxxxxxxx>
> ---
> Changes in v2:
> - Fix circular dependency warning
> - Link to v1: https://lore.kernel.org/r/20241031091558.2435850-1-mripard@xxxxxxxxxx
>
> ---
> Maxime Ripard (4):
> drm/tests: hdmi: Fix WW_MUTEX_SLOWPATH failures
> drm/tests: hdmi: Remove redundant assignments
> drm/tests: hdmi: Reorder DRM entities variables assignment
> drm/tests: hdmi: Fix recursive locking
>
> drivers/gpu/drm/tests/drm_hdmi_state_helper_test.c | 200 +++++++++++----------
> 1 file changed, 103 insertions(+), 97 deletions(-)
> ---
> base-commit: e2a81c0cd7de6cb063058be304b18f200c64802b
> change-id: 20250129-test-kunit-5ba3c03bffb0
>
> Best regards,
> --
> Maxime Ripard <mripard@xxxxxxxxxx>
>
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch