Re: [PATCH net] bonding: Fix warning in default_device_exit_batch()

From: Shigeru Yoshida
Date: Mon Mar 13 2023 - 22:18:29 EST


Hi Nik,

On Mon, Mar 13, 2023 at 12:52:44PM +0200, Nikolay Aleksandrov wrote:
> On 13/03/2023 11:35, Shigeru Yoshida wrote:
> > Hi Nik,
> >
> > On Sun, Mar 12, 2023 at 10:58:18PM +0200, Nikolay Aleksandrov wrote:
> >> On 12/03/2023 17:21, Shigeru Yoshida wrote:
> >>> syzbot reported warning in default_device_exit_batch() like below [1]:
> >>>
> >>> WARNING: CPU: 1 PID: 56 at net/core/dev.c:10867 unregister_netdevice_many_notify+0x14cf/0x19f0 net/core/dev.c:10867
> >>> ...
> >>> Call Trace:
> >>> <TASK>
> >>> unregister_netdevice_many net/core/dev.c:10897 [inline]
> >>> default_device_exit_batch+0x451/0x5b0 net/core/dev.c:11350
> >>> ops_exit_list+0x125/0x170 net/core/net_namespace.c:174
> >>> cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
> >>> process_one_work+0x9bf/0x1820 kernel/workqueue.c:2390
> >>> worker_thread+0x669/0x1090 kernel/workqueue.c:2537
> >>> kthread+0x2e8/0x3a0 kernel/kthread.c:376
> >>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> >>> </TASK>
> >>>
> >>> For bond devices which also has a master device, IFF_SLAVE flag is
> >>> cleared at err_undo_flags label in bond_enslave() if it is not
> >>> ARPHRD_ETHER type. In this case, __bond_release_one() is not called
> >>> when bond_netdev_event() received NETDEV_UNREGISTER event. This
> >>> causes the above warning.
> >>>
> >>> This patch fixes this issue by setting IFF_SLAVE flag at
> >>> err_undo_flags label in bond_enslave() if the bond device has a master
> >>> device.
> >>>
> >>
> >> The proper way is to check if the bond device had the IFF_SLAVE flag before the
> >> ether_setup() call which clears it, and restore it after.
> >>
> >>> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure")
> >>> Cc: Nikolay Aleksandrov <nikolay@xxxxxxxxxxxxxxxxxxx>
> >>> Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [1]
> >>> Reported-by: syzbot+9dfc3f3348729cc82277@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>> Signed-off-by: Shigeru Yoshida <syoshida@xxxxxxxxxx>
> >>> ---
> >>> drivers/net/bonding/bond_main.c | 2 ++
> >>> include/net/bonding.h | 5 +++++
> >>> 2 files changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >>> index 00646aa315c3..1a8b59e1468d 100644
> >>> --- a/drivers/net/bonding/bond_main.c
> >>> +++ b/drivers/net/bonding/bond_main.c
> >>> @@ -2291,6 +2291,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> >>> dev_close(bond_dev);
> >>> ether_setup(bond_dev);
> >>> bond_dev->flags |= IFF_MASTER;
> >>> + if (bond_has_master(bond))
> >>> + bond_dev->flags |= IFF_SLAVE;
> >>> bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
> >>> }
> >>> }
> >>> diff --git a/include/net/bonding.h b/include/net/bonding.h
> >>> index ea36ab7f9e72..ed0b49501fad 100644
> >>> --- a/include/net/bonding.h
> >>> +++ b/include/net/bonding.h
> >>> @@ -57,6 +57,11 @@
> >>>
> >>> #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond))
> >>>
> >>> +/* master list primitives */
> >>> +#define bond_master_list(bond) (&(bond)->dev->adj_list.upper)
> >>> +
> >>> +#define bond_has_master(bond) !list_empty(bond_master_list(bond))
> >>> +
> >>
> >> This is not the proper way to check for a master device.
> >>
> >>> /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */
> >>> #define bond_first_slave(bond) \
> >>> (bond_has_slaves(bond) ? \
> >>
> >> The device flags are wrong because of ether_setup() which clears IFF_SLAVE, we should
> >> just check if it was present before and restore it after the ether_setup() call.
> >
> > Thank you so much for your comment! I understand your point, and
> > agree that your approach must resolve the issue.
> >
> > BTW, do you mean there is a case where a device has IFF_SLAVE flag but
> > the upper list is empty? I thought a device with IFF_SLAVE flag has a
> > master device in the upper list (that is why I took the above way.)
> >
>
> Hi Shigeru,
> No, that's not what I meant. It's the opposite actually, you may have an upper list
> but you don't have a "master" device or slave flag set. Yes, you can say that if
> a device has IFF_SLAVE set, then it must have a master upper device but that's not
> what you're checking for, you've reversed that logic to check for an upper device instead
> and assume there's a IFF_SLAVE flag set (which may not be true).
> For an upper device to be considered a "master" device, it must have the master bool set to
> true in its netdev_adjacent structure. We already have helpers to check for master devices
> and to retrieve them, e.g. check netdev_master_upper_dev_get* in net/core/dev.c
>
> The most robust way to fix it is to check if the flag was there prior to the ether_setup() call
> and restore it after, also to leave a nice comment about all of this. :)

Thanks for kindly explanation. I've now understand why my fix is not
sufficient to check a master device. And, yes, the most robust and
simple way to fix the issue is to check the flag before it is cleared.

Thanks you~
Shigeru

>
> > Thanks,
> > Shigeru
> >
>
> Cheers,
> Nik
>