Re: [EXTERNAL] Re: [PATCH net v2 1/4] octeon_ep: fix race conditions in ndo_get_stats64

From: Larysa Zaremba
Date: Tue Dec 17 2024 - 12:07:24 EST


On Mon, Dec 16, 2024 at 06:28:13PM +0000, Shinas Rasheed wrote:
> Hi Larysa,
>
> > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > > > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > > > output queue resources. Call synchronize_net() to avoid such races.
> > > >
> > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > > > Signed-off-by: Shinas Rasheed <srasheed@xxxxxxxxxxx>
> > > > ---
> > > > V2:
> > > > - Changed sync mechanism to fix race conditions from using an atomic
> > > > set_bit ops to a much simpler synchronize_net()
> > > >
> > > > V1: https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__lore.kernel.org_all_20241203072130.2316913-2D2-2Dsrasheed-
> > 40marvell.com_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y-
> > oxrlgQ1rjXgWtmLz1pnaDjD96sDq-cKUwK4&m=Dh7BH5wsuCdQnE-
> > 4erjptaJnM42YsLU2tY4wPn5NWqwsymkNOllPfQAkomj1mXPN&s=IjWHk3SOqr
> > ibgv6kz-WTL8VfGVInSu5DzKSbcjCFIvk&e=
> > > >
> > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > index 549436efc204..941bbaaa67b5 100644
> > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
> > > > {
> > > > struct octep_device *oct = netdev_priv(netdev);
> > > >
> > > > + synchronize_net();
> > >
> > > You should have elaborated on the fact that this synchronize_net() is for
> > > __LINK_STATE_START flag in the commit message, this is not obvious. Also,
> > is
> > > octep_get_stats64() called from RCU-safe context?
> > >
> >
> > Now I see that in case !netif_running(), you do not bail out of
> > octep_get_stats64() fully (or at all after the second patch). So, could you
> > explain, how are you utilizing RCU here?
> >
>
> The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is called from RCU safe contexts, and
> that the netdev op is never called after the ndo_stop().

As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around the
thing, but there are a lot more callers and for example veth_get_stats64()
explicitly calls rcu_read_lock().

Also, even with RCU-protected section, I am not sure prevents the
octep_get_stats64() to be called after synchronize_net() finishes. Again, the
callers seem too diverse to definitely say that we can rely on built-in flags
for this to not happen :/

>
> Thanks for the comments