Re: [PATCH v2] net: make unregister netdev warning timeout configurable

From: Dmitry Vyukov
Date: Thu Mar 25 2021 - 03:40:39 EST


On Wed, Mar 24, 2021 at 10:40 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Tue, Mar 23, 2021 at 7:49 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > netdev_wait_allrefs() issues a warning if refcount does not drop to 0
> > after 10 seconds. While 10 second wait generally should not happen
> > under normal workload in normal environment, it seems to fire falsely
> > very often during fuzzing and/or in qemu emulation (~10x slower).
> > At least it's not possible to understand if it's really a false
> > positive or not. Automated testing generally bumps all timeouts
> > to very high values to avoid flake failures.
> > Add net.core.netdev_unregister_timeout_secs sysctl to make
> > the timeout configurable for automated testing systems.
> > Lowering the timeout may also be useful for e.g. manual bisection.
> > The default value matches the current behavior.
> >
> > Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
> > Cc: netdev@xxxxxxxxxxxxxxx
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> >
> > ---
> > Changes since v1:
> > - use sysctl instead of a config
> > ---
>
> > },
> > + {
> > + .procname = "netdev_unregister_timeout_secs",
> > + .data = &netdev_unregister_timeout_secs,
> > + .maxlen = sizeof(unsigned int),
> > + .mode = 0644,
> > + .proc_handler = proc_dointvec_minmax,
> > + .extra1 = SYSCTL_ZERO,
> > + .extra2 = &int_3600,
> > + },
> > { }
> > };
> >
>
> If we allow the sysctl to be 0, then we risk a flood of pr_emerg()
> (one per jiffy ?)

My reasoning was that it's up to the user. Some spammy output on the
console for rare events is probably not the worst way how root can
misconfigure the kernel :)
It allows one to check (more or less) if we are reaching
unregister_netdevice with non-zero refcount, which may be useful for
some debugging maybe.
But I don't mind changing it to 1 (or 5) if you prefer. On syzbot we
only want to increase it.

> If you really want the zero value, you need to change pr_emerg() to
> pr_emerg_ratelimited()
>
> Also, please base your patch on net-next, to avoid future merge conflicts
> with my prior patch add2d7363107 "net: set initial device refcount to 1".