Re: [EXTERNAL] Re: [PATCH net v4 1/4] octeon_ep: fix race conditions in ndo_get_stats64

From: Jakub Kicinski
Date: Mon Jan 06 2025 - 15:57:28 EST


On Mon, 6 Jan 2025 05:57:09 +0000 Shinas Rasheed wrote:
> > > struct octep_device *oct = netdev_priv(netdev);
> > > int q;
> > >
> > > - if (netif_running(netdev))
> > > - octep_ctrl_net_get_if_stats(oct,
> > > - OCTEP_CTRL_NET_INVALID_VFID,
> > > - &oct->iface_rx_stats,
> > > - &oct->iface_tx_stats);
> > > -
> > > tx_packets = 0;
> > > tx_bytes = 0;
> > > rx_packets = 0;
> > > rx_bytes = 0;
> > > +
> > > + if (!netif_running(netdev))
> > > + return;
> >
> > So we'll provide no stats when the device is down? That's not correct.
> > The driver should save the stats from the freed queues (somewhere in
> > the oct structure). Also please mention how this is synchronized
> > against netif_running() changing its state, device may get closed while
> > we're running..
>
> I ACK the 'save stats from freed queues and emit out stats when device is down'.
>
> About the synchronization, the reason I changed to simple netif_running check was to avoid
> locks (as per previous patch version comments). Please do correct me if I'm wrong, but isn't the case
> you mentioned protected by the rtnl_lock held by the netdev stack when it calls the ndo_op ?

I don't see rtnl_lock being taken in the procfs path.

FWIW I posted a test for the problem you're fixing in octeon,
since it's relatively common among drivers:
https://lore.kernel.org/20250105011525.1718380-1-kuba@xxxxxxxxxx
see also:
https://github.com/linux-netdev/nipa/wiki/Running-driver-tests