kernel bug found and root cause analysis

From: ffhgfv
Date: Sun Mar 09 2025 - 07:25:07 EST


Hello, I found a bug titled " INFO: task hung in ib_enum_all_roce_netdevs " with modified syzkaller in the lasted upstream related to INFINIBAND sybsystem.
If you fix this issue, please add the following tag to the commit: Reported-by: Jianzhou Zhao <xnxc22xnxc22@xxxxxx>, xingwei lee <xrivendell7@xxxxxxxxx>, Zhizhuo Tang <strforexctzzchange@xxxxxxxxxxx>

------------[ cut here ]-----------------------------------------
BUG: corrupted list in fix_fullness_group
==================================================================
INFO: task kworker/u8:5:12618 blocked for more than 143 seconds.
Not tainted 6.14.0-rc5-dirty #2
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:5 state:D stack:28240 pid:12618 tgid:12618 ppid:2 task_flags:0x4208060 flags:0x00004000
Workqueue: gid-cache-wq netdevice_event_work_handler
Call Trace:
<task>
context_switch kernel/sched/core.c:5378 [inline]
__schedule+0xf26/0x57d0 kernel/sched/core.c:6765
__schedule_loop kernel/sched/core.c:6842 [inline]
schedule+0xe7/0x350 kernel/sched/core.c:6857
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6914
rwsem_down_read_slowpath+0x5b5/0xb30 kernel/locking/rwsem.c:1084
__down_read_common kernel/locking/rwsem.c:1248 [inline]
__down_read kernel/locking/rwsem.c:1261 [inline]
down_read+0x11b/0x330 kernel/locking/rwsem.c:1526
ib_enum_all_roce_netdevs+0x7a/0x140 drivers/infiniband/core/device.c:2390
netdevice_event_work_handler+0xd2/0x350 drivers/infiniband/core/roce_gid_mgmt.c:648
process_one_work+0xa09/0x1c30 kernel/workqueue.c:3250
process_scheduled_works kernel/workqueue.c:3334 [inline]
worker_thread+0x677/0xe90 kernel/workqueue.c:3415
kthread+0x3b0/0x760 kernel/kthread.c:464
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</task>
INFO: task syz.2.99:13334 blocked for more than 143 seconds.
Not tainted 6.14.0-rc5-dirty #2
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.2.99 state:D stack:26288 pid:13334 tgid:13328 ppid:9456 task_flags:0x400140 flags:0x00000004
Call Trace:
<task>
context_switch kernel/sched/core.c:5378 [inline]
__schedule+0xf26/0x57d0 kernel/sched/core.c:6765
__schedule_loop kernel/sched/core.c:6842 [inline]
schedule+0xe7/0x350 kernel/sched/core.c:6857
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6914
rwsem_down_write_slowpath+0x3e8/0x12a0 kernel/locking/rwsem.c:1176
__down_write_common kernel/locking/rwsem.c:1304 [inline]
__down_write kernel/locking/rwsem.c:1313 [inline]
down_write+0x1d7/0x200 kernel/locking/rwsem.c:1578
assign_name drivers/infiniband/core/device.c:1197 [inline]
ib_register_device+0x88/0xdd0 drivers/infiniband/core/device.c:1384
rxe_register_device+0x2be/0x380 drivers/infiniband/sw/rxe/rxe_verbs.c:1540
rxe_net_add+0x96/0x100 drivers/infiniband/sw/rxe/rxe_net.c:550
rxe_newlink+0xf0/0x1b0 drivers/infiniband/sw/rxe/rxe.c:212
nldev_newlink+0x376/0x600 drivers/infiniband/core/nldev.c:1795
rdma_nl_rcv_msg+0x383/0x6e0 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb.constprop.0.isra.0+0x2fc/0x440 drivers/infiniband/core/netlink.c:239
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x544/0x800 net/netlink/af_netlink.c:1338
netlink_sendmsg+0x8a5/0xd80 net/netlink/af_netlink.c:1882
sock_sendmsg_nosec net/socket.c:718 [inline]
__sock_sendmsg net/socket.c:733 [inline]
____sys_sendmsg+0xab8/0xc70 net/socket.c:2573
___sys_sendmsg+0x11d/0x1c0 net/socket.c:2627
__sys_sendmsg+0x151/0x200 net/socket.c:2659
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fbd179a962d
RSP: 002b:00007fbd1885df98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fbd17bc5f80 RCX: 00007fbd179a962d
RDX: 0000000020000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 00007fbd17a4e373 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fbd17bc5f80 R15: 00007fbd1883e000
</task>

Showing all locks held in the system:
2 locks held by systemd/1:
#0: ffff8880250ed658 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff8880250ed658 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
3 locks held by kworker/u8:2/15:
#0: ffff88801beeb948 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x1327/0x1c30 kernel/workqueue.c:3221
#1: ffffc9000041fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work+0x8f8/0x1c30 kernel/workqueue.c:3222
#2: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net+0xca/0xb90 net/core/net_namespace.c:606
1 lock held by khungtaskd/35:
#0: ffffffff8dfbc1a0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#0: ffffffff8dfbc1a0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
#0: ffffffff8dfbc1a0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x7f/0x390 kernel/locking/lockdep.c:6746
3 locks held by kworker/u9:3/68:
#0: ffff88801b081148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1327/0x1c30 kernel/workqueue.c:3221
#1: ffffc90000d0fd20 ((work_completion)(&amp;(&amp;krcp-&gt;page_cache_work)-&gt;work)){+.+.}-{0:0}, at: process_one_work+0x8f8/0x1c30 kernel/workqueue.c:3222
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
1 lock held by kswapd0/97:
#0: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xc92/0x19a0 mm/vmscan.c:7012
1 lock held by kswapd1/98:
#0: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xc92/0x19a0 mm/vmscan.c:7012
3 locks held by kworker/u10:5/727:
#0: ffff88801b081148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1327/0x1c30 kernel/workqueue.c:3221
#1: ffffc900031c7d20 ((work_completion)(&amp;(&amp;krcp-&gt;page_cache_work)-&gt;work)){+.+.}-{0:0}, at: process_one_work+0x8f8/0x1c30 kernel/workqueue.c:3222
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#2: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by systemd-journal/5206:
#0: ffff888021599070 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff888021599070 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by systemd-udevd/5219:
2 locks held by cron/8673:
#0: ffff88804e417658 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff88804e417658 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by in:imklog/8774:
#0: ffff88804e3ad730 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff88804e3ad730 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by rs:main Q:Reg/8775:
2 locks held by sshd/9408:
#0: ffff88804e27c580 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff88804e27c580 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by syz-executor/9415:
#0: ffff88804a73a9e0 (&amp;mm-&gt;mmap_lock){++++}-{4:4}, at: mmap_read_trylock include/linux/mmap_lock.h:209 [inline]
#0: ffff88804a73a9e0 (&amp;mm-&gt;mmap_lock){++++}-{4:4}, at: get_mmap_lock_carefully mm/memory.c:6249 [inline]
#0: ffff88804a73a9e0 (&amp;mm-&gt;mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x35/0x6f0 mm/memory.c:6309
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
3 locks held by kworker/u8:3/10823:
4 locks held by systemd-udevd/12209:
3 locks held by kworker/u8:5/12618:
#0: ffff88801d3ef148 ((wq_completion)gid-cache-wq){+.+.}-{0:0}, at: process_one_work+0x1327/0x1c30 kernel/workqueue.c:3221
#1: ffffc9000800fd20 ((work_completion)(&amp;ndev_work-&gt;work)){+.+.}-{0:0}, at: process_one_work+0x8f8/0x1c30 kernel/workqueue.c:3222
#2: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: ib_enum_all_roce_netdevs+0x7a/0x140 drivers/infiniband/core/device.c:2390
6 locks held by syz.6.90/13120:
#0: ffffffff9a9ede38 (&amp;rdma_nl_types[idx].sem){.+.+}-{4:4}, at: rdma_nl_rcv_msg+0x165/0x6e0 drivers/infiniband/core/netlink.c:164
#1: ffffffff8f94f1f0 (link_ops_rwsem){++++}-{4:4}, at: nldev_newlink+0x2c6/0x600 drivers/infiniband/core/nldev.c:1785
#2: ffffffff8f93b570 (
devices_rwsem
){++++}-{4:4}
, at: enable_device_and_get+0xfc/0x3c0 drivers/infiniband/core/device.c:1312
#3: ffffffff8f93b2f0 (rdma_nets_rwsem){++++}-{4:4}, at: add_compat_devs drivers/infiniband/core/device.c:1015 [inline]
#3: ffffffff8f93b2f0 (rdma_nets_rwsem){++++}-{4:4}, at: enable_device_and_get+0x2ae/0x3c0 drivers/infiniband/core/device.c:1328
#4: ffff88805baa8f58 (&amp;device-&gt;compat_devs_mutex){+.+.}-{4:4}, at: add_one_compat_dev+0x10e/0x830 drivers/infiniband/core/device.c:933
#5: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#5: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#5: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#5: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
3 locks held by syz.2.99/13334:
#0: ffffffff9a9ede38 (&amp;rdma_nl_types[idx].sem){.+.+}-{4:4}, at: rdma_nl_rcv_msg+0x165/0x6e0 drivers/infiniband/core/netlink.c:164
#1: ffffffff8f94f1f0 (link_ops_rwsem){++++}-{4:4}, at: nldev_newlink+0x2c6/0x600 drivers/infiniband/core/nldev.c:1785
#2: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: assign_name drivers/infiniband/core/device.c:1197 [inline]
#2: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: ib_register_device+0x88/0xdd0 drivers/infiniband/core/device.c:1384
1 lock held by syz.0.106/13395:
#0: ffffffff8dfc7500 (rcu_state.barrier_mutex){+.+.}-{4:4}, at: rcu_barrier+0x48/0x6b0 kernel/rcu/tree.c:3741
2 locks held by syz-executor/13392:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
1 lock held by syz.5.109/13417:
#0: ffffffff8dfc7500 (rcu_state.barrier_mutex){+.+.}-{4:4}, at: rcu_barrier+0x48/0x6b0 kernel/rcu/tree.c:3741
1 lock held by syz.7.110/13415:
#0: ffffffff8dfc7500 (rcu_state.barrier_mutex){+.+.}-{4:4}, at: rcu_barrier+0x48/0x6b0 kernel/rcu/tree.c:3741
1 lock held by syz.1.114/13472:
#0: ffffffff8dfc7500 (rcu_state.barrier_mutex){+.+.}-{4:4}, at: rcu_barrier+0x48/0x6b0 kernel/rcu/tree.c:3741
2 locks held by systemd-udevd/13482:
#0: ffff8880466223d0 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff8880466223d0 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by syz-executor/13484:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13510:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13527:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13536:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13544:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13552:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13562:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13569:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13614:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13618:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13631:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13646:
#0: ffff88804e2a1a90 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff88804e2a1a90 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by syz-executor/13647:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
2 locks held by syz-executor/13648:
#0: ffffffff8fcd9450 (pernet_ops_rwsem){++++}-{4:4}, at: copy_net_ns+0x28a/0x600 net/core/net_namespace.c:512
#1: ffffffff8f93b570 (devices_rwsem){++++}-{4:4}, at: rdma_dev_init_net+0x233/0x510 drivers/infiniband/core/device.c:1169
1 lock held by syz-executor/13653:
#0: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#0: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#0: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#0: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752
2 locks held by syz-executor/13668:
#0: ffff8880574839b8 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: vma_start_read include/linux/mm.h:717 [inline]
#0: ffff8880574839b8 (&amp;vma-&gt;vm_lock-&gt;lock){++++}-{4:4}, at: lock_vma_under_rcu+0x141/0x9a0 mm/memory.c:6378
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:3926 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
#1: ffffffff8e150aa0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_frozen_pages_noprof+0xa43/0x21f0 mm/page_alloc.c:4752

=============================================

NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 35 Comm: khungtaskd Not tainted 6.14.0-rc5-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<task>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120
nmi_cpu_backtrace+0x2a0/0x350 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x29c/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:236 [inline]
watchdog+0xea3/0x1200 kernel/hung_task.c:399
kthread+0x3b0/0x760 kernel/kthread.c:464
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</task>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 8 Comm: kworker/0:0 Not tainted 6.14.0-rc5-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events_freezable_pwr_efficient disk_events_workfn
RIP: 0010:mark_lock+0x15b/0xd60 kernel/locking/lockdep.c:4720
Code: cc cc cc cc 48 8d 7e 22 48 89 f8 48 c1 e8 03 0f b6 04 10 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 78 04 00 00 41 f6 45 22 03 &lt;0f&gt; 84 3b ff ff ff 41 be 00 02 00 00 41 bc 09 00 00 00 e9 39 ff ff
RSP: 0000:ffffc9000019eba0 EFLAGS: 00000002
RAX: 0000000000000000 RBX: 1ffff92000033d7a RCX: ffffffff81947d6e
RDX: 0000000000000002 RSI: ffff88801cae3040 RDI: ffff88801cae3062
RBP: ffffc9000019ecd8 R08: 0000000000000000 R09: fffffbfff2d943af
R10: ffffffff96ca1d7f R11: 0000000000000002 R12: 0000000000000008
R13: ffff88801cae3040 R14: 0000000000000005 R15: ffff88801cae2500
FS: 0000000000000000(0000) GS:ffff88802b800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f209d042ba0 CR3: 0000000011e32000 CR4: 0000000000752ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<nmi>
</nmi>
<task>
mark_usage kernel/locking/lockdep.c:4672 [inline]
__lock_acquire+0x9fb/0x3c80 kernel/locking/lockdep.c:5182
lock_acquire.part.0+0x11b/0x370 kernel/locking/lockdep.c:5851
rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
rcu_read_lock include/linux/rcupdate.h:849 [inline]
list_lru_count_one+0x40/0x370 mm/list_lru.c:254
list_lru_shrink_count include/linux/list_lru.h:190 [inline]
super_cache_count+0x16f/0x300 fs/super.c:263
do_shrink_slab+0x81/0x11b0 mm/shrinker.c:384
shrink_slab+0x332/0x12a0 mm/shrinker.c:664
shrink_one+0x4ad/0x7c0 mm/vmscan.c:4868
shrink_many mm/vmscan.c:4929 [inline]
lru_gen_shrink_node mm/vmscan.c:5007 [inline]
shrink_node+0x2698/0x3d60 mm/vmscan.c:5978
shrink_zones mm/vmscan.c:6237 [inline]
do_try_to_free_pages+0x372/0x1990 mm/vmscan.c:6299
try_to_free_pages+0x2a4/0x6b0 mm/vmscan.c:6549
__perform_reclaim mm/page_alloc.c:3929 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:3951 [inline]
__alloc_pages_slowpath mm/page_alloc.c:4382 [inline]
__alloc_frozen_pages_noprof+0xac3/0x21f0 mm/page_alloc.c:4752
alloc_pages_mpol+0x1f2/0x540 mm/mempolicy.c:2270
alloc_frozen_pages_noprof mm/mempolicy.c:2341 [inline]
alloc_pages_noprof+0x12d/0x390 mm/mempolicy.c:2361
bio_copy_kern block/blk-map.c:443 [inline]
blk_rq_map_kern+0x228/0x750 block/blk-map.c:718
scsi_execute_cmd+0xb53/0xe60 drivers/scsi/scsi_lib.c:316
sr_get_events drivers/scsi/sr.c:177 [inline]
sr_check_events+0x1b5/0xac0 drivers/scsi/sr.c:218
cdrom_update_events drivers/cdrom/cdrom.c:1468 [inline]
cdrom_check_events+0x65/0x110 drivers/cdrom/cdrom.c:1478
sr_block_check_events+0xc3/0x100 drivers/scsi/sr.c:565
disk_check_events+0xc4/0x420 block/disk-events.c:193
process_one_work+0xa09/0x1c30 kernel/workqueue.c:3250
process_scheduled_works kernel/workqueue.c:3334 [inline]
worker_thread+0x677/0xe90 kernel/workqueue.c:3415
kthread+0x3b0/0x760 kernel/kthread.c:464
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</task>

==================================================================
I use the same kernel as syzbot instance upstream: 7eb172143d5508b4da468ed59ee857c6e5e01da6
kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&amp;x=da4b04ae798b7ef6
compiler: gcc version 11.4.0
===============================================================================
Unfortunately, the modified syzkaller does not generate an effective repeat program.
The following is my analysis of the bug and repair suggestions, hoping to help with the repair of the bug:
## Root cause analysis
Lock competition scenario:
kworker/u8:5 holds a read lock (down_read(&amp;devices_rwsem)) in ib_enum_all_roce_netdevs.
syz.2.99 Try to get the write lock (down_write(&amp;devices_rwsem)) in ib_register_device.
Other processes, such as syz.6.90, also hold read locks, resulting in write lock hunger.
Call path analysis:
When kworker processes network device events, traversing all RoCE devices requires holding read locks for a long time.
syz.2.99 When registering a new device, you need to modify the device list and must obtain the write lock.
When the read lock is not released, the write lock cannot be obtained, resulting in a reverse dependency deadlock.
Memory reclamation trigger lock dependency:
Memory allocation during device registration (GFP_KERNEL) may trigger direct reclamation, involving file system locks (fs_reclaim).
If additional locks are required for file system operations, complex lock-dependent chains may form.

=========================================================================
I hope it helps.
Best regards
Jianzhou Zhao
xingwei lee
Zhizhuo Tang</strforexctzzchange@xxxxxxxxxxx></xrivendell7@xxxxxxxxx></xnxc22xnxc22@xxxxxx>