deadlock in devlink_compat_running_version and suggestions for fixing it
From: =?gb18030?b?ZmZoZ2Z2?=
Date: Wed Mar 05 2025 - 09:26:28 EST
Hello, I found a bug titled " INFO: task hung in devlink_compat_running_version" with modified syzkaller in the lasted upstream related to devlink system.
If you fix this issue, please add the following tag to the commit: Reported-by: Jianzhou Zhao <xnxc22xnxc22@xxxxxx>, xingwei lee <xrivendell7@xxxxxxxxx>, Zhizhuo Tang <strforexctzzchange@xxxxxxxxxxx>
------------[ cut here ]------------
TITLE: INFO: task hung in devlink_compat_running_version
==================================================================
INFO: task systemd-udevd:15007 blocked for more than 143 seconds.
Not tainted 6.14.0-rc5-dirty #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-udevd state:D stack:20128 pid:15007 tgid:15007 ppid:5221 task_flags:0x400140 flags:0x00004000
Call Trace:
<task>
context_switch kernel/sched/core.c:5378 [inline]
__schedule+0xf26/0x57d0 kernel/sched/core.c:6765
__schedule_loop kernel/sched/core.c:6842 [inline]
schedule+0xe7/0x350 kernel/sched/core.c:6857
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6914
__mutex_lock_common kernel/locking/mutex.c:662 [inline]
__mutex_lock+0x631/0xb00 kernel/locking/mutex.c:730
devlink_compat_running_version+0xd5/0x7f0 net/devlink/dev.c:1224
dev_ethtool+0x27a/0x330 net/ethtool/ioctl.c:3411
dev_ioctl+0x2d4/0x10c0 net/core/dev_ioctl.c:759
sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
sock_ioctl+0x23a/0x6c0 net/socket.c:1318
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f33c87aa237
RSP: 002b:00007ffd0cbbafd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005559e7171c10 RCX: 00007f33c87aa237
RDX: 00007ffd0cbbb0a0 RSI: 0000000000008946 RDI: 0000000000000007
RBP: 00007ffd0cbbb0d0 R08: 00005559e7196eb0 R09: 0000000000000000
R10: 00007f33c85ec6c0 R11: 0000000000000246 R12: 00005559e7196eb0
R13: 00005559e719e090 R14: 00007ffd0cbbb0a0 R15: 0000000000000007
</task>
Showing all locks held in the system:
3 locks held by systemd/1:
3 locks held by kworker/u10:0/29:
#0: ffff88801b081148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1327/0x1c30 kernel/workqueue.c:3221
#1: ffffc9000050fd20 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work+0x8f8/0x1c30 kernel/workqueue.c:3222
#2: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: linkwatch_event+0xf/0x70 net/core/link_watch.c:285
1 lock held by khungtaskd/35:
#0: ffffffff8dfbc1a0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#0: ffffffff8dfbc1a0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
#0: ffffffff8dfbc1a0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x7f/0x390 kernel/locking/lockdep.c:6746
2 locks held by kswapd0/98:
5 locks held by kworker/u10:3/256:
1 lock held by systemd-journal/5208:
#0: ffff888046813df0 (&vma->vm_lock->lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff888046813df0 (&vma->vm_lock->lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
1 lock held by systemd-udevd/5221:
#0: ffff888045dd72e8 (mapping.invalidate_lock){++++}-{4:4}, at: filemap_invalidate_lock_shared include/linux/fs.h:932 [inline]
#0: ffff888045dd72e8 (mapping.invalidate_lock){++++}-{4:4}, at: page_cache_ra_unbounded+0x173/0x790 mm/readahead.c:229
1 lock held by cron/8674:
#0: ffff88804cdf1658 (&vma->vm_lock->lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff88804cdf1658 (&vma->vm_lock->lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
3 locks held by sshd/9404:
1 lock held by syz-executor/9410:
#0: ffff8880258d4730 (&vma->vm_lock->lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff8880258d4730 (&vma->vm_lock->lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
3 locks held by kworker/u8:3/9993:
#0: ffff88804a98f148 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x1327/0x1c30 kernel/workqueue.c:3221
#1: ffffc90006dcfd20 ((work_completion)(&(&net->ipv6.addr_chk_work)->work)){+.+.}-{0:0}, at: process_one_work+0x8f8/0x1c30 kernel/workqueue.c:3222
#2: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_net_lock include/linux/rtnetlink.h:129 [inline]
#2: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: addrconf_verify_work+0x12/0x30 net/ipv6/addrconf.c:4730
2 locks held by kworker/u8:4/10878:
4 locks held by kworker/u8:5/12145:
#0: ffff88801beeb948 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x1327/0x1c30 kernel/workqueue.c:3221
#1: ffffc90003217d20 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work+0x8f8/0x1c30 kernel/workqueue.c:3222
#2: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net+0xca/0xb90 net/core/net_namespace.c:606
#3: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: wg_destruct+0x29/0x3d0 drivers/net/wireguard/device.c:246
1 lock held by systemd-udevd/15007:
#0: ffff88805c834250 (&devlink->lock_key#7){+.+.}-{4:4}, at: devlink_compat_running_version+0xd5/0x7f0 net/devlink/dev.c:1224
7 locks held by syz-executor/15828:
#0: ffff88802657c420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x122/0x240 fs/read_write.c:731
#1: ffff88806c476088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x27a/0x500 fs/kernfs/file.c:325
#2: ffff88801fa39d28 (kn->active#63){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x29e/0x500 fs/kernfs/file.c:326
#3: ffffffff8f29aec8 (nsim_bus_dev_list_lock){+.+.}-{4:4}, at: del_device_store+0xc9/0x4b0 drivers/net/netdevsim/bus.c:216
#4: ffff88805c8350e8 (&dev->mutex){....}-{4:4}, at: device_lock include/linux/device.h:1030 [inline]
#4: ffff88805c8350e8 (&dev->mutex){....}-{4:4}, at: __device_driver_lock drivers/base/dd.c:1095 [inline]
#4: ffff88805c8350e8 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0xa4/0x620 drivers/base/dd.c:1293
#5: ffff88805c834250 (&devlink->lock_key#7){+.+.}-{4:4}, at: nsim_drv_remove+0x4a/0x1d0 drivers/net/netdevsim/dev.c:1675
#6: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: unregister_nexthop_notifier+0x19/0x70 net/ipv4/nexthop.c:3906
2 locks held by ifquery/18079:
#0: ffff8880766a66c8 (nlk_cb_mutex-ROUTE){+.+.}-{4:4}, at: __netlink_dump_start+0x156/0x980 net/netlink/af_netlink.c:2387
#1: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
#1: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_dumpit+0x199/0x200 net/core/rtnetlink.c:6780
2 locks held by syz-executor/18644:
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: tun_detach drivers/net/tun.c:698 [inline]
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: tun_chr_close+0x38/0x230 drivers/net/tun.c:3517
#1: ffffffff8dfc7638 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock+0x1a4/0x3a0 kernel/rcu/tree_exp.h:334
3 locks held by syz-executor/18673:
#0: ffff8880273ecdf0 (&hdev->req_lock){+.+.}-{4:4}, at: hci_dev_do_close+0x29/0xa0 net/bluetooth/hci_core.c:480
#1: ffff8880273ec078 (&hdev->lock){+.+.}-{4:4}, at: hci_dev_close_sync+0x35e/0x11a0 net/bluetooth/hci_sync.c:5185
#2: ffff88805c5da350 (&conn->lock#2){+.+.}-{4:4}, at: l2cap_conn_del+0x80/0x750 net/bluetooth/l2cap_core.c:1761
3 locks held by syz-executor/18700:
#0: ffff88807639cdf0 (&hdev->req_lock){+.+.}-{4:4}, at: hci_dev_do_close+0x29/0xa0 net/bluetooth/hci_core.c:480
#1: ffff88807639c078 (&hdev->lock){+.+.}-{4:4}, at: hci_dev_close_sync+0x35e/0x11a0 net/bluetooth/hci_sync.c:5185
#2: ffff88807aecb350 (&conn->lock#2){+.+.}-{4:4}, at: l2cap_conn_del+0x80/0x750 net/bluetooth/l2cap_core.c:1761
2 locks held by syz-executor/19212:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: cangw_pernet_exit_batch+0x15/0xa0 net/can/gw.c:1257
2 locks held by syz-executor/19515:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: ip_tunnel_init_net+0x20f/0x780 net/ipv4/ip_tunnel.c:1159
2 locks held by syz-executor/19552:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: ip_tunnel_init_net+0x20f/0x780 net/ipv4/ip_tunnel.c:1159
2 locks held by ifquery/19620:
#0: ffff888076d836c8 (nlk_cb_mutex-ROUTE){+.+.}-{4:4}, at: netlink_dump+0x663/0xcf0 net/netlink/af_netlink.c:2254
#1: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
#1: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_dumpit+0x199/0x200 net/core/rtnetlink.c:6780
2 locks held by syz-executor/19627:
#0: ffff88807d654df0 (&hdev->req_lock){+.+.}-{4:4}, at: hci_dev_do_close+0x29/0xa0 net/bluetooth/hci_core.c:480
#1: ffff88807d654078 (&hdev->lock){+.+.}-{4:4}, at: hci_dev_close_sync+0x35e/0x11a0 net/bluetooth/hci_sync.c:5185
1 lock held by syz-executor/19746:
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_net_lock include/linux/rtnetlink.h:129 [inline]
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: inet_rtm_newaddr+0x30f/0x1570 net/ipv4/devinet.c:987
1 lock held by syz-executor/19755:
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_net_lock include/linux/rtnetlink.h:129 [inline]
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: inet_rtm_newaddr+0x30f/0x1570 net/ipv4/devinet.c:987
1 lock held by syz-executor/19761:
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_net_lock include/linux/rtnetlink.h:129 [inline]
#0: ffffffff8fcef528 (rtnl_mutex){+.+.}-{4:4}, at: inet_rtm_newaddr+0x30f/0x1570 net/ipv4/devinet.c:987
2 locks held by syz-executor/19787:
4 locks held by syz-executor/19789:
1 lock held by syz-executor/19843:
1 lock held by syz-executor/19845:
#0: ffff88804d04e4a8 (&vma->vm_lock->lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff88804d04e4a8 (&vma->vm_lock->lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
1 lock held by syz-executor/19847:
#0: ffff8880281d2730 (&vma->vm_lock->lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff8880281d2730 (&vma->vm_lock->lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
1 lock held by syz-executor/19850:
#0: ffff88804d34bb68 (&vma->vm_lock->lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff88804d34bb68 (&vma->vm_lock->lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
=============================================
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 35 Comm: khungtaskd Not tainted 6.14.0-rc5-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<task>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120
nmi_cpu_backtrace+0x2a0/0x350 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x29c/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:236 [inline]
watchdog+0xea3/0x1200 kernel/hung_task.c:399
kthread+0x3b0/0x760 kernel/kthread.c:464
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</task>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 19787 Comm: syz-executor Not tainted 6.14.0-rc5-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:130 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0x5a/0x1a0 mm/kasan/generic.c:189
Code: b8 00 00 00 4c 8d 54 37 ff 48 89 fd 48 b8 00 00 00 00 00 fc ff df 4d 89 d1 48 c1 ed 03 49 c1 e9 03 48 01 c5 49 01 c1 48 89 e8 <49> 8d 59 01 48 89 da 48 29 ea 48 83 fa 10 0f 8e 92 00 00 00 41 89
RSP: 0018:ffffc90002a77458 EFLAGS: 00000086
RAX: fffffbfff2d943a0 RBX: 0000000000000019 RCX: ffffffff81947d6e
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff96ca1d00
RBP: fffffbfff2d943a0 R08: 0000000000000000 R09: fffffbfff2d943a0
R10: ffffffff96ca1d07 R11: 0000000000000002 R12: 0000000000000000
R13: ffff88802b523108 R14: 0000000000000019 R15: ffff88802b522500
FS: 0000555589ac2500(0000) GS:ffff88802b800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005559aafa0ff0 CR3: 000000005ae6c000 CR4: 0000000000752ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 80000000
Call Trace:
<nmi>
</nmi>
<task>
instrument_atomic_read include/linux/instrumented.h:68 [inline]
_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline]
hlock_class+0x4e/0x130 kernel/locking/lockdep.c:230
check_wait_context kernel/locking/lockdep.c:4853 [inline]
__lock_acquire+0x451/0x3c80 kernel/locking/lockdep.c:5178
lock_acquire.part.0+0x11b/0x370 kernel/locking/lockdep.c:5851
rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
rcu_read_lock_sched include/linux/rcupdate.h:941 [inline]
pfn_valid include/linux/mmzone.h:2067 [inline]
pfn_valid include/linux/mmzone.h:2050 [inline]
page_table_check_set+0x113/0x9f0 mm/page_table_check.c:110
__page_table_check_ptes_set+0x28e/0x450 mm/page_table_check.c:225
page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
set_ptes include/linux/pgtable.h:288 [inline]
__copy_present_ptes mm/memory.c:968 [inline]
copy_present_ptes mm/memory.c:1051 [inline]
copy_pte_range mm/memory.c:1174 [inline]
copy_pmd_range mm/memory.c:1262 [inline]
copy_pud_range mm/memory.c:1299 [inline]
copy_p4d_range mm/memory.c:1323 [inline]
copy_page_range+0x3048/0x4e30 mm/memory.c:1421
dup_mmap kernel/fork.c:748 [inline]
dup_mm kernel/fork.c:1700 [inline]
copy_mm kernel/fork.c:1752 [inline]
copy_process+0x7dea/0x8ab0 kernel/fork.c:2403
kernel_clone+0xeb/0x920 kernel/fork.c:2815
__do_sys_clone+0xcf/0x120 kernel/fork.c:2958
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5b34d9fec7
Code: 00 00 90 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 41 89 c0 85 c0 75 2c 64 48 8b 04 25 10 00
RSP: 002b:00007ffe3ca176c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007f5b35afd660 RCX: 00007f5b34d9fec7
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000080000000
R10: 0000555589ac27d0 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000003 R14: 00007f5b34e4e881 R15: 0000000000000002
</task>
==================================================================
I use the same kernel as syzbot instance upstream: 7eb172143d5508b4da468ed59ee857c6e5e01da6
kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=da4b04ae798b7ef6
compiler: gcc version 11.4.0
===============================================================================
Unfortunately, the modified syzkaller does not generate an effective repeat program.
The following is my analysis of the bug and repair suggestions, hoping to help with the repair of the bug:
## Root cause analysis
The problem is with the devlink_compat_running_version function, which is calling the path that needs to get the RTNL lock (rtnl_mutex) while holding the devlink lock (devl_lock).
There is another reverse lock acquisition sequence in the system (first RTNL lock and then devlink lock), resulting in lock inversion and deadlock. The specific performance is as follows:
devlink_compat_running_version()
devl_lock(devlink); // Holding devlink lock
¡ý
__devlink_compat_running_version()
¡ú Implicitly calls an operation that requires rtnl_lock() // wait for the RTNL lock
linkwatch_event()
rtnl_lock(); // Holding devlink RTNL lock
¡ú Invoke the code path involving devlink
devl_lock(devlink); //Wait for the devlink lock
### Repair suggestions
Adjust the lock acquisition sequence. Force all code paths involving devlink locks and RTNL locks to lock in the order of RTNL locks ¡ú devlink locks.
Patch example:
void devlink_compat_running_version(struct devlink *devlink, char *buf, size_t len)
{
if (!devlink->ops->info_get)
return;
+ rtnl_lock(); // Get the RTNL lock first
devl_lock(devlink);
if (devl_is_registered(devlink))
__devlink_compat_running_version(devlink, buf, len); // Ensure that RTNL locks are not invoked internally
devl_unlock(devlink);
+ rtnl_unlock();
}
=========================================================================
I hope it helps.
Best regards
Jianzhou Zhao
xingwei lee
Zhizhuo Tang</strforexctzzchange@xxxxxxxxxxx></xrivendell7@xxxxxxxxx></xnxc22xnxc22@xxxxxx>