Re: [syzbot] INFO: task can't die in __lock_sock

From: Thomas Gleixner
Date: Wed Sep 22 2021 - 10:16:45 EST


On Mon, Sep 20 2021 at 08:50, syzbot wrote:
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> possible deadlock in rfcomm_sk_state_change
>
> ============================================
> WARNING: possible recursive locking detected
> 5.15.0-rc2-syzkaller #0 Not tainted
> --------------------------------------------
> syz-executor.0/9050 is trying to acquire lock:
> ffff88807ce5d120 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
> ffff88807ce5d120 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sk_state_change+0xb4/0x390 net/bluetooth/rfcomm/sock.c:73
>
> but task is already holding lock:
> ffff88807ce5d120 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
> ffff88807ce5d120 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sock_shutdown+0x54/0x210 net/bluetooth/rfcomm/sock.c:928

it's not only possible recursion. It's real. Same lock instance and the
stack trace tells how this happens

lock_sock_nested+0x4e/0x140 net/core/sock.c:3183
lock_sock include/net/sock.h:1612 [inline]

Lock is already held. See below.

rfcomm_sk_state_change+0xb4/0x390 net/bluetooth/rfcomm/sock.c:73
__rfcomm_dlc_close+0x1b6/0x8a0 net/bluetooth/rfcomm/core.c:489
rfcomm_dlc_close+0x1ea/0x240 net/bluetooth/rfcomm/core.c:520
__rfcomm_sock_close+0xac/0x260 net/bluetooth/rfcomm/sock.c:220

sock lock is held from here.

rfcomm_sock_shutdown+0xe9/0x210 net/bluetooth/rfcomm/sock.c:931
rfcomm_sock_release+0x5f/0x140 net/bluetooth/rfcomm/sock.c:951
__sock_release+0xcd/0x280 net/socket.c:649
sock_close+0x18/0x20 net/socket.c:1314
__fput+0x288/0x9f0 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164

I assume that the lock_sock*() lockdep change was applied on top of
Linus tree. The previous reports were showing lockups IIRC because
lockdep had no chance to see that due to the placement of the acquire
annotation.

Thanks,

tglx