Re: [PATCH] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format
From: Mikhail Gavrilov
Date: Tue May 19 2026 - 09:13:32 EST
On Wed, Apr 29, 2026 at 7:37 PM Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:
>
> When dumping IB contents from a hung job, amdgpu_devcoredump_format()
> acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid()
> and then, for each IB referenced by the job, calls amdgpu_bo_reserve()
> on the BO that backs the IB. Both reservations are taken on
> reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx,
> which trips lockdep:
>
> WARNING: possible recursive locking detected
> --------------------------------------------
> kworker/u128:0 is trying to acquire lock:
> ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
> at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
>
> but task is already holding lock:
> ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
> at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
>
> Possible unsafe locking scenario:
> CPU0
> ----
> lock(reservation_ww_class_mutex);
> lock(reservation_ww_class_mutex);
>
> *** DEADLOCK ***
> May be due to missing lock nesting notation
>
> Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
> Call Trace:
> __ww_mutex_lock.constprop.0
> ww_mutex_lock
> amdgpu_bo_reserve
> amdgpu_devcoredump_format+0x1594 [amdgpu]
> amdgpu_devcoredump_deferred_work+0xea [amdgpu]
> process_one_work
> worker_thread
> kthread
>
Friendly ping. Pierre-Eric, Christian, Alex — any thoughts on this fix?
Happy to spin a v2 with any review feedback. One thing I'm aware of:
the `Cc: stable@xxxxxxxxxxxxxxx # 7.1` tag is probably unnecessary
since the regression only landed in 7.1-rc1 and the fix will reach 7.1
final naturally via drm-fixes; I can drop it in v2 if preferred.
--
Best Regards,
Mike Gavrilov.