Re: [syzbot] [net?] WARNING: bad unlock balance in do_setlink

From: Eric Dumazet
Date: Tue Apr 08 2025 - 16:41:57 EST


On Tue, Apr 8, 2025 at 10:16 PM Aleksandr Nogikh <nogikh@xxxxxxxxxx> wrote:
>
> On Tue, Apr 8, 2025 at 1:33 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Tue, Apr 8, 2025 at 12:44 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > >
> > > On Tue, 8 Apr 2025 at 10:11, Aleksandr Nogikh <nogikh@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Apr 7, 2025 at 6:13 PM 'Kuniyuki Iwashima' via syzkaller-bugs
> > > > <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > From: Stanislav Fomichev <stfomichev@xxxxxxxxx>
> > > > > Date: Mon, 7 Apr 2025 07:19:54 -0700
> > > > > > On 04/07, syzbot wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > > > unregister_netdevice: waiting for DEV to become free
> > > > > > >
> > > > > > > unregister_netdevice: waiting for batadv0 to become free. Usage count = 3
> > > > > >
> > > > > > So it does fix the lock unbalance issue, but now there is a hang?
> > > > >
> > > > > I think this is an orthogonal issue.
> > > > >
> > > > > I saw this in another report as well.
> > > > > https://lore.kernel.org/netdev/67f208ea.050a0220.0a13.025b.GAE@xxxxxxxxxx/
> > > > >
> > > > > syzbot may want to find a better way to filter this kind of noise.
> > > > >
> > > >
> > > > Syzbot treats this message as a problem worthy of reporting since a
> > > > long time (Cc'd Dmitry who may remember the context):
> > > > https://github.com/google/syzkaller/commit/7a67784ca8bdc3b26cce2f0ec9a40d2dd9ec9396
> > > >
> > > > Since v6.15-rc1, we do observe it happen at least 10x more often than
> > > > before, both during fuzzing and while processing #syz test commands:
> > > > https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
> > >
> > > IIUC this error means a leaked reference count on a device, and the
> > > device and everything it references leaked forever + a kernel thread
> > > looping forever. This does not look like noise.
> > >
> > > Eric, should know more. Eric fixed a bunch of these bugs and added a
> > > ref count tracker to devices to provide better diagnostics. For some
> > > reason I don't see the reftracker output in the console output, but
> > > CONFIG_NET_DEV_REFCNT_TRACKER=y is enabled in the config.
> >
> > I think that Kuniyuki patch was fixing the original syzbot report.
> >
> > After fixing this trivial bug, another bug showed up,
> > and this second bug triggered "syzbot may want to find a better way to
> > filter this kind of noise." comment.
>
> FWIW I've just bisected the recent spike in "unregister_netdevice:
> waiting for batadv0 to become free" and git bisect pointed to:
>
> 00b35530811f2aa3d7ceec2dbada80861c7632a8
> Author: Eric Dumazet <edumazet@xxxxxxxxxx>
> Date: Thu Feb 6 14:04:22 2025 +0000
>
> batman-adv: adopt netdev_hold() / netdev_put()
>
> Add a device tracker to struct batadv_hard_iface to help
> debugging of network device refcount imbalances.
>
>
> Eric, could you please have a look?
>

My original patch was :
https://lore.kernel.org/netdev/CANn89i+ySFS5C24guM9E9UsPWfQBL69-OoRDbOGfih9vLGxDJg@xxxxxxxxxxxxxx/T/

I think it was correct.

Then Sven added code in it, instead of adding a separate patch.

I guess a fix would be :

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index f145f96626531053bbf8f58a31f28f625a9d80f9..7cd4bdcee43935b9e5fb7d1696430909b7af67b4
100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -725,7 +725,6 @@ int batadv_hardif_enable_interface(struct
batadv_hard_iface *hard_iface,

kref_get(&hard_iface->refcount);

- dev_hold(mesh_iface);
netdev_hold(mesh_iface, &hard_iface->meshif_dev_tracker, GFP_ATOMIC);
hard_iface->mesh_iface = mesh_iface;
bat_priv = netdev_priv(hard_iface->mesh_iface);