Re: unregister_netdevice: waiting for DEV to become free (2)

From: Jouni HÃgander
Date: Thu Dec 05 2019 - 05:00:11 EST


Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> writes:

> [ 61.584734] Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> [ 61.590407] RSP: 002b:00007f25d540ec88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 61.592488] RAX: ffffffffffffffda RBX: 000000000071bf00 RCX: 000000000045a729
> [ 61.594552] RDX: 0000000020000040 RSI: 00000000400454d9 RDI: 0000000000000003
> [ 61.596829] RBP: 00007f25d540eca0 R08: 0000000000000000 R09: 0000000000000000
> [ 61.598540] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f25d540f6d4
> [ 61.600278] R13: 00000000004ac5a5 R14: 00000000006ee8a0 R15: 0000000000000005
> [ 61.655323] kobject_add_internal failed for tx-1 (error: -12 parent: queues)
> [ 71.760970] unregister_netdevice: waiting for vet to become free. Usage count = -1
> [ 82.028434] unregister_netdevice: waiting for vet to become free. Usage count = -1
> [ 92.140031] unregister_netdevice: waiting for vet to become free. Usage count = -1
> ----------
>
> Worrisome part is that tun_attach() calls tun_set_real_num_queues() at the end of tun_attach()
> but tun_set_real_num_queues() is not handling netif_set_real_num_tx_queues() failure.
> That is, tun_attach() is returning success even if netdev_queue_update_kobjects() from
> netif_set_real_num_tx_queues() failed.
>
> static void tun_set_real_num_queues(struct tun_struct *tun)
> {
> netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
> netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
> }
>
> And I guess that ignoring that failure causes clean-up function to drop a refcount
> which was not held by initialization function. Applying below diff seems to avoid
> this problem. Please check.
>
> ----------
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index ae3bcb1540ec..562d06c274aa 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1459,14 +1459,14 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
> struct kobject *kobj = &queue->kobj;
> int error = 0;
>
> + dev_hold(queue->dev);
> +
> kobj->kset = dev->queues_kset;
> error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
> "tx-%u", index);
> if (error)
> goto err;
>
> - dev_hold(queue->dev);
> -
> #ifdef CONFIG_BQL
> error = sysfs_create_group(kobj, &dql_group);
> if (error)

Now after reproducing the issue I think this is actually proper fix for
the issue. It's not related to missing error handling in in
tun_set_real_num_queues as I commented earlier. Can you prepare patch
for this?

BR,

Jouni HÃgander