Re: net: deadlock on genl_mutex

From: Dmitry Vyukov
Date: Sun Jan 29 2017 - 08:07:01 EST


On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>>> Chain exists of:
>>> Possible unsafe locking scenario:
>>>
>>> CPU0 CPU1
>>> ---- ----
>>> lock(genl_mutex);
>>> lock(nlk->cb_mutex);
>>> lock(genl_mutex);
>>> lock(rtnl_mutex);
>>>
>>> *** DEADLOCK ***
>>
>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>> Let me think about it.
>
> Never mind. Actually both reports in this thread are legitimate.
>
> I know what happened now, the lock chain is so long, 4 locks are involved
> to form a chain!!!
>
> Let me think about how to break the chain.


Cong, any success with breaking the chain?

Still happenning on f0ad17712b9f71c24e2b8b9725230ef57232377f. Or is it
a different one?


[ INFO: possible circular locking dependency detected ]
4.10.0-rc3+ #4 Not tainted
-------------------------------------------------------
syz-executor9/2705 is trying to acquire lock:
(genl_mutex){+.+.+.}, at: [<ffffffff836f58fe>] genl_lock
net/netlink/genetlink.c:32 [inline]
(genl_mutex){+.+.+.}, at: [<ffffffff836f58fe>]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547

but task is already holding lock:
(rtnl_mutex){+.+.+.}, at: [<ffffffff836416e7>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.+.}:

[<ffffffff8157e729>] validate_chain kernel/locking/lockdep.c:2265 [inline]
[<ffffffff8157e729>] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[<ffffffff815808b1>] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[<ffffffff843f9de0>] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[<ffffffff843f9de0>] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[<ffffffff836416e7>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
[<ffffffff83fd5e9e>] nl80211_pre_doit+0x2fe/0x570 net/wireless/nl80211.c:11847
[<ffffffff836f52b0>] genl_family_rcv_msg+0x760/0x1040
net/netlink/genetlink.c:591
[<ffffffff836f807a>] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[<ffffffff836f36cb>] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[<ffffffff836f4b38>] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[<ffffffff836f1f14>] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[<ffffffff836f1f14>] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[<ffffffff836f2bcf>] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[<ffffffff83572d3a>] sock_sendmsg_nosec net/socket.c:635 [inline]
[<ffffffff83572d3a>] sock_sendmsg+0xca/0x110 net/socket.c:645
[<ffffffff8357557a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
[<ffffffff83578138>] __sys_sendmsg+0x138/0x300 net/socket.c:2019
[<ffffffff8357832d>] SYSC_sendmsg net/socket.c:2030 [inline]
[<ffffffff8357832d>] SyS_sendmsg+0x2d/0x50 net/socket.c:2026
[<ffffffff8440e7c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (genl_mutex){+.+.+.}:

[<ffffffff8157847f>] check_prev_add kernel/locking/lockdep.c:1828 [inline]
[<ffffffff8157847f>] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
[<ffffffff8157e729>] validate_chain kernel/locking/lockdep.c:2265 [inline]
[<ffffffff8157e729>] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[<ffffffff815808b1>] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[<ffffffff843f9de0>] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[<ffffffff843f9de0>] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[<ffffffff836f58fe>] genl_lock net/netlink/genetlink.c:32 [inline]
[<ffffffff836f58fe>] genl_family_rcv_msg+0xdae/0x1040
net/netlink/genetlink.c:547
[<ffffffff836f807a>] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[<ffffffff836f36cb>] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[<ffffffff836f4b38>] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[<ffffffff836f1f14>] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[<ffffffff836f1f14>] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[<ffffffff836f2bcf>] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[<ffffffff83572d3a>] sock_sendmsg_nosec net/socket.c:635 [inline]
[<ffffffff83572d3a>] sock_sendmsg+0xca/0x110 net/socket.c:645
[<ffffffff835730a6>] sock_write_iter+0x326/0x600 net/socket.c:848
[<ffffffff81a3c493>] new_sync_write fs/read_write.c:499 [inline]
[<ffffffff81a3c493>] __vfs_write+0x483/0x740 fs/read_write.c:512
[<ffffffff81a42227>] vfs_write+0x187/0x530 fs/read_write.c:560
[<ffffffff81a4675b>] SYSC_write fs/read_write.c:607 [inline]
[<ffffffff81a4675b>] SyS_write+0xfb/0x230 fs/read_write.c:599
[<ffffffff8440e7c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(genl_mutex);
lock(rtnl_mutex);
lock(genl_mutex);

*** DEADLOCK ***

2 locks held by syz-executor9/2705:
#0: (cb_lock){++++++}, at: [<ffffffff836f4b29>] genl_rcv+0x19/0x40
net/netlink/genetlink.c:630
#1: (rtnl_mutex){+.+.+.}, at: [<ffffffff836416e7>]
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 1 PID: 2705 Comm: syz-executor9 Not tainted 4.10.0-rc3+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1202
check_prev_add kernel/locking/lockdep.c:1828 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
validate_chain kernel/locking/lockdep.c:2265 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
__mutex_lock_common kernel/locking/mutex.c:639 [inline]
mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
genl_lock net/netlink/genetlink.c:32 [inline]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547
genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
new_sync_write fs/read_write.c:499 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:512
vfs_write+0x187/0x530 fs/read_write.c:560
SYSC_write fs/read_write.c:607 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:599
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x44f5e9
RSP: 002b:00007fdba138cb58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000020000fdc RCX: 000000000044f5e9
RDX: 0000000000000024 RSI: 0000000020000fdc RDI: 0000000000000006
RBP: 0000000000000006 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000700000
R13: 0000000000000002 R14: 0000000000000010 R15: 0000000000000000