Re: [syzbot] [net?] possible deadlock in rtnl_lock (8)

From: Eric Dumazet
Date: Tue Sep 10 2024 - 02:37:27 EST


On Tue, Sep 10, 2024 at 7:55 AM D. Wythe <alibuda@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 9/9/24 7:44 PM, Wenjia Zhang wrote:
> >
> >
> > On 09.09.24 10:02, Eric Dumazet wrote:
> >> On Sun, Sep 8, 2024 at 10:12 AM syzbot
> >> <syzbot+51cf7cc5f9ffc1006ef2@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>> syzbot has found a reproducer for the following issue on:
> >>>
> >>> HEAD commit: df54f4a16f82 Merge branch 'for-next/core' into
> >>> for-kernelci
> >>> git tree:
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
> >>> for-kernelci
> >>> console output:
> >>> https://syzkaller.appspot.com/x/log.txt?x=12bdabc7980000
> >>> kernel config:
> >>> https://syzkaller.appspot.com/x/.config?x=dde5a5ba8d41ee9e
> >>> dashboard link:
> >>> https://syzkaller.appspot.com/bug?extid=51cf7cc5f9ffc1006ef2
> >>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils
> >>> for Debian) 2.40
> >>> userspace arch: arm64
> >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1798589f980000
> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10a30e00580000
> >>>
> >>> Downloadable assets:
> >>> disk image:
> >>> https://storage.googleapis.com/syzbot-assets/aa2eb06e0aea/disk-df54f4a1.raw.xz
> >>> vmlinux:
> >>> https://storage.googleapis.com/syzbot-assets/14728733d385/vmlinux-df54f4a1.xz
> >>> kernel image:
> >>> https://storage.googleapis.com/syzbot-assets/99816271407d/Image-df54f4a1.gz.xz
> >>>
> >>> IMPORTANT: if you fix the issue, please add the following tag to the
> >>> commit:
> >>> Reported-by: syzbot+51cf7cc5f9ffc1006ef2@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>>
> >>> ======================================================
> >>> WARNING: possible circular locking dependency detected
> >>> 6.11.0-rc5-syzkaller-gdf54f4a16f82 #0 Not tainted
> >>> ------------------------------------------------------
> >>> syz-executor272/6388 is trying to acquire lock:
> >>> ffff8000923b6ce8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x20/0x2c
> >>> net/core/rtnetlink.c:79
> >>>
> >>> but task is already holding lock:
> >>> ffff0000dc408a50 (&smc->clcsock_release_lock){+.+.}-{3:3}, at:
> >>> smc_setsockopt+0x178/0x10fc net/smc/af_smc.c:3064
> >>>
> >>> which lock already depends on the new lock.
> >>>
>
> I have noticed this issue for a while, but I question the possibility of
> it. If I understand correctly, a deadlock issue following is reported here:
>
> #2
> lock_sock_smc
> {
> clcsock_release_lock --- deadlock
> {
>
> }
> }
>
> #1
> rtnl_mutex
> {
> lock_sock_smc
> {
>
> }
> }
>
> #0
> clcsock_release_lock
> {
> rtnl_mutex --deadlock
> {
>
> }
> }
>
> This is of course a deadlock, but #1 is suspicious.
>
> How would this happen to a smc sock?
>
> #1 ->
> lock_sock_nested+0x38/0xe8 net/core/sock.c:3543
> lock_sock include/net/sock.h:1607 [inline]
> sockopt_lock_sock net/core/sock.c:1061 [inline]
> sockopt_lock_sock+0x58/0x74 net/core/sock.c:1052
> do_ip_setsockopt+0xe0/0x2358 net/ipv4/ip_sockglue.c:1078
> ip_setsockopt+0x34/0x9c net/ipv4/ip_sockglue.c:1417
> raw_setsockopt+0x7c/0x2e0 net/ipv4/raw.c:845
> sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
> do_sock_setsockopt+0x17c/0x354 net/socket.c:2324
>
> As a comparison, the correct calling chain should be:
>
> sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
> smc_setsockopt+0x150/0xcec net/smc/af_smc.c:3072
> do_sock_setsockopt+0x17c/0x354 net/socket.c:2324
>
>
> That's to say, any setting on SOL_IP options of smc_sock will
> go with smc_setsockopt, which will try lock clcsock_release_lock at first.
>
> Anyway, if anyone can explain #1, then we can see how to solve this problem,
> otherwise I think this problem doesn't exist. (Just my opinion)

Then SMC lacks some lockdep annotations.

Please take a look at sock_lock_init_class_and_name() callers.