Possible circular locking dependency (&dev->clientlist_mutex){+.+.}-{4:4}, at: drm_client_dev_suspend+0x44/0x140 but (console_lock){+.+.}-{0:0}, at: radeon_suspend_kms+0x3e8/0x490 [radeon] (Talos II, kernel v6.13.1)

From: Erhard Furtner
Date: Sun Feb 02 2025 - 10:16:21 EST


Greetings!

My Talos II (ppc64) boots up fine on v6.13.1 but at reboot I always get this warning with a SLUB_DEBUG_ON=y and PROVE_LOCKING=y enabled kernel:

[...]
EXT4-fs (nvme0n1p2): unmounting filesystem 4913eef4-b406-4b09-ad17-549fbf0a775e.
systemd-shutdown[1]: Syncing filesystems and block devices.
systemd-shutdown[1]: Sending SIGTERM to remaining processes...
systemd-journald[931]: Received SIGTERM from PID 1 (systemd-shutdow).
systemd-shutdown[1]: Sending SIGKILL to remaining processes...
systemd-shutdown[1]: Unmounting file systems.
(sd-umount)[1711]: Unmounting '/run/credentials/systemd-vconsole-setup.service'.
(sd-umount)[1712]: Unmounting '/run/credentials/systemd-journald.service'.
(sd-remount)[1713]: Remounting '/' read-only with options 'compress=zstd:1,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/'.
systemd-shutdown[1]: All filesystems unmounted.
systemd-shutdown[1]: Deactivating swaps.
systemd-shutdown[1]: All swaps deactivated.
systemd-shutdown[1]: Detaching loop devices.
systemd-shutdown[1]: All loop devices detached.
systemd-shutdown[1]: Stopping MD devices.
systemd-shutdown[1]: All MD devices stopped.
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: All DM devices detached.
systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
systemd-shutdown[1]: Syncing filesystems and block devices.
systemd-shutdown[1]: Rebooting.
radeon 0033:01:00.0: Refused to change power state from D0 to D3hot

======================================================
WARNING: possible circular locking dependency detected
6.13.1-P9 #6 Tainted: G T
------------------------------------------------------
systemd-shutdow/1 is trying to acquire lock:
c000200015768300 (&dev->clientlist_mutex){+.+.}-{4:4}, at: drm_client_dev_suspend+0x44/0x140

but task is already holding lock:
c0000000023fb260 (console_lock){+.+.}-{0:0}, at: radeon_suspend_kms+0x3e8/0x490 [radeon]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (console_lock){+.+.}-{0:0}:
lock_acquire+0x128/0x3d0
console_lock+0x74/0xe0
fbcon_fb_registered+0x2d4/0x2f0
do_register_framebuffer+0x1ac/0x2f0
register_framebuffer+0x40/0x70
__drm_fb_helper_initial_config_and_unlock+0x3c0/0x6e0
drm_fbdev_client_hotplug+0xb8/0x140
drm_client_register+0xa8/0x120
drm_fbdev_client_setup+0x140/0x290
drm_client_setup+0x20/0x70
radeon_pci_probe+0x218/0x270 [radeon]
local_pci_probe+0x60/0xf0
work_for_cpu_fn+0x30/0x50
process_one_work+0x29c/0x810
worker_thread+0x1fc/0x410
kthread+0x148/0x150
start_kernel_thread+0x14/0x18

-> #1 (registration_lock){+.+.}-{4:4}:
lock_acquire+0x128/0x3d0
__mutex_lock+0xe0/0x1060
register_framebuffer+0x34/0x70
__drm_fb_helper_initial_config_and_unlock+0x3c0/0x6e0
drm_fbdev_client_hotplug+0xb8/0x140
drm_client_register+0xa8/0x120
drm_fbdev_client_setup+0x140/0x290
drm_client_setup+0x20/0x70
radeon_pci_probe+0x218/0x270 [radeon]
local_pci_probe+0x60/0xf0
work_for_cpu_fn+0x30/0x50
process_one_work+0x29c/0x810
worker_thread+0x1fc/0x410
kthread+0x148/0x150
start_kernel_thread+0x14/0x18

-> #0 (&dev->clientlist_mutex){+.+.}-{4:4}:
check_prev_add+0x174/0x1240
__lock_acquire+0x17e0/0x2120
lock_acquire+0x128/0x3d0
__mutex_lock+0xe0/0x1060
drm_client_dev_suspend+0x44/0x140
radeon_suspend_kms+0x3f8/0x490 [radeon]
radeon_pci_shutdown+0x40/0xa0 [radeon]
pci_device_shutdown+0x5c/0xd0
device_shutdown+0x1fc/0x300
kernel_restart+0x5c/0xf0
__do_sys_reboot+0x130/0x2e0
system_call_exception+0x1b4/0x390
system_call_vectored_common+0xf0/0x280

other info that might help us debug this:

Chain exists of:
&dev->clientlist_mutex --> registration_lock --> console_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(console_lock);
lock(registration_lock);
lock(console_lock);
lock(&dev->clientlist_mutex);

*** DEADLOCK ***

4 locks held by systemd-shutdow/1:
#0: c0000000023f5d20 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xf8/0x2e0
#1: c0000000119bc1b0 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x150/0x300
#2: c0000000119b91b0 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x164/0x300
#3: c0000000023fb260 (console_lock){+.+.}-{0:0}, at: radeon_suspend_kms+0x3e8/0x490 [radeon]

stack backtrace:
CPU: 13 UID: 0 PID: 1 Comm: systemd-shutdow Tainted: G T 6.13.1-P9 #6
Tainted: [T]=RANDSTRUCT
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
Call Trace:
[c0000000084ff3e0] [c0000000010991a8] dump_stack_lvl+0xbc/0x110 (unreliable)
[c0000000084ff420] [c000000000202318] print_circular_bug+0x3c8/0x470
[c0000000084ff4d0] [c000000000202594] check_noncircular+0x1d4/0x1f0
[c0000000084ff590] [c000000000203c94] check_prev_add+0x174/0x1240
[c0000000084ff650] [c000000000208460] __lock_acquire+0x17e0/0x2120
[c0000000084ff790] [c000000000208ec8] lock_acquire+0x128/0x3d0
[c0000000084ff890] [c0000000010e12e0] __mutex_lock+0xe0/0x1060
[c0000000084ff9b0] [c000000000c1f4b4] drm_client_dev_suspend+0x44/0x140
[c0000000084ffa40] [c00800000de36610] radeon_suspend_kms+0x3f8/0x490 [radeon]
[c0000000084ffb00] [c00800000de33638] radeon_pci_shutdown+0x40/0xa0 [radeon]
[c0000000084ffb30] [c000000000b0952c] pci_device_shutdown+0x5c/0xd0
[c0000000084ffb70] [c000000000c5d99c] device_shutdown+0x1fc/0x300
[c0000000084ffc00] [c0000000001a2b5c] kernel_restart+0x5c/0xf0
[c0000000084ffc70] [c0000000001a2f70] __do_sys_reboot+0x130/0x2e0
[c0000000084ffdd0] [c00000000002ea64] system_call_exception+0x1b4/0x390
[c0000000084ffe50] [c00000000000c270] system_call_vectored_common+0xf0/0x280
--- interrupt: 3000 at 0x3fffbd6ec040
NIP: 00003fffbd6ec040 LR: 00003fffbd6ec040 CTR: 0000000000000000
REGS: c0000000084ffe80 TRAP: 3000 Tainted: G T (6.13.1-P9)
MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 48002448 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000058 00003ffffbed38e0 00003fffbd7f7100 fffffffffee1dead
GPR04: 0000000028121969 0000000001234567 672e000000000000 0000000000000020
GPR08: 00003ffffbed2ed5 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00003fffbdce57e0 00000001341eadc7 00000001341eade0
GPR16: 00000001341eae47 0000000000000000 0000000000000000 0000000000000001
GPR20: 0000000000000000 00003ffffbed39e8 00003ffffbed3f48 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000001234567 00000001341ebba8
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
NIP [00003fffbd6ec040] 0x3fffbd6ec040
LR [00003fffbd6ec040] 0x3fffbd6ec040
--- interrupt: 3000


# lspci
0000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:01:00.0 Non-Volatile memory controller: Phison Electronics Corporation E8 PCIe3 x2 NVMe Controller (rev 01)
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516 [Radeon X1300/X1550 Series]
0033:01:00.1 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516
[Radeon X1300/X1550 Series] (Secondary)

Kernel .config attached, full dmesg can be provided if needed.

Regards,
Erhard

Attachment: config_6131_p9
Description: Binary data