Re: [EXTERNAL] Re: [PATCH net v2 1/4] octeon_ep: fix race conditions in ndo_get_stats64

From: Larysa Zaremba
Date: Wed Dec 18 2024 - 09:50:24 EST


On Wed, Dec 18, 2024 at 03:21:12PM +0100, Eric Dumazet wrote:
> On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba <larysa.zaremba@xxxxxxxxx> wrote:
>
> >
> > It is hard to know without testing (but testing should not be hard). I think the
> > phrase "Statistics must persist across routine operations like bringing the
> > interface down and up." [0] implies that bringing the interface down may not
> > necessarily prevent stats calls.
>
> Please don't add workarounds to individual drivers.
>
> I think the core networking stack should handle the possible races.
>
> Most dev_get_stats() callers are correctly testing dev_isalive() or
> are protected by RTNL.
>
> There are few nested cases that are not properly handled, the
> following patch should take care of them.
>

I was under the impression that .ndo_stop() being called does not mean the
device stops being NETREG_REGISTERED, such link would be required to solve the
original problem with your patch alone (though it is generally a good change).
Could you please explain this relation?

>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2593019ad5b1614f3b8c037afb4ba4fa740c7d51..768afc2a18d343d051e7a1b631124910af9922d2
> 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -5342,6 +5342,12 @@ static inline const char
> *netdev_reg_state(const struct net_device *dev)
> return " (unknown)";
> }
>
> +/* Caller holds RTNL or RCU */
> +static inline int dev_isalive(const struct net_device *dev)
> +{
> + return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
> +}
> +
> #define MODULE_ALIAS_NETDEV(device) \
> MODULE_ALIAS("netdev-" device)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index c7f3dea3e0eb9eb05865e49dd7a8535afb974149..f11f305f3136f208fcb285c7b314914aef20dfad
> 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -11044,8 +11044,13 @@ struct rtnl_link_stats64
> *dev_get_stats(struct net_device *dev,
> const struct net_device_ops *ops = dev->netdev_ops;
> const struct net_device_core_stats __percpu *p;
>
> + memset(storage, 0, sizeof(*storage));
> + rcu_read_lock();
> +
> + if (unlikely(!dev_isalive(dev)))
> + goto unlock;
> +
> if (ops->ndo_get_stats64) {
> - memset(storage, 0, sizeof(*storage));
> ops->ndo_get_stats64(dev, storage);
> } else if (ops->ndo_get_stats) {
> netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev));
> @@ -11071,6 +11076,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct
> net_device *dev,
> storage->rx_otherhost_dropped +=
> READ_ONCE(core_stats->rx_otherhost_dropped);
> }
> }
> +unlock:
> + rcu_read_unlock();
> return storage;
> }
> EXPORT_SYMBOL(dev_get_stats);
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 2d9afc6e2161efa51ffa62813ec10c8f43944bce..3f4851d67015c959dd531c571c46fc2ac18beb65
> 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -36,12 +36,6 @@ static const char fmt_uint[] = "%u\n";
> static const char fmt_ulong[] = "%lu\n";
> static const char fmt_u64[] = "%llu\n";
>
> -/* Caller holds RTNL or RCU */
> -static inline int dev_isalive(const struct net_device *dev)
> -{
> - return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
> -}
> -
> /* use same locking rules as GIF* ioctl's */
> static ssize_t netdev_show(const struct device *dev,
> struct device_attribute *attr, char *buf,