RE: [EXT] Re: [PATCH net-next v5 1/8] octeon_ep_vf: Add driver framework and device initialization

From: Shinas Rasheed
Date: Tue Feb 06 2024 - 02:43:00 EST


Hi,

> -----Original Message-----
> From: Jakub Kicinski <kuba@xxxxxxxxxx>
> Sent: Tuesday, February 6, 2024 5:15 AM
> To: Shinas Rasheed <srasheed@xxxxxxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Haseeb Gani
> <hgani@xxxxxxxxxxx>; Vimlesh Kumar <vimleshk@xxxxxxxxxxx>; Sathesh B
> Edara <sedara@xxxxxxxxxxx>; egallen@xxxxxxxxxx; mschmidt@xxxxxxxxxx;
> pabeni@xxxxxxxxxx; horms@xxxxxxxxxx; wizhao@xxxxxxxxxx;
> kheib@xxxxxxxxxx; konguyen@xxxxxxxxxx; David S. Miller
> <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; Jonathan
> Corbet <corbet@xxxxxxx>; Veerasenareddy Burru <vburru@xxxxxxxxxxx>;
> Satananda Burla <sburla@xxxxxxxxxxx>; Shannon Nelson
> <shannon.nelson@xxxxxxx>; Tony Nguyen <anthony.l.nguyen@xxxxxxxxx>;
> Joshua Hay <joshua.a.hay@xxxxxxxxx>; Rahul Rameshbabu
> <rrameshbabu@xxxxxxxxxx>; Brett Creeley <brett.creeley@xxxxxxx>;
> Andrew Lunn <andrew@xxxxxxx>; Jacob Keller <jacob.e.keller@xxxxxxxxx>
> Subject: Re: [EXT] Re: [PATCH net-next v5 1/8] octeon_ep_vf: Add driver
> framework and device initialization
>
> > > > +static void octep_vf_tx_timeout(struct net_device *netdev, unsigned int
> > > txqueue)
> > > > +{
> > > > + struct octep_vf_device *oct = netdev_priv(netdev);
> > > > +
> > > > + queue_work(octep_vf_wq, &oct->tx_timeout_task);
> > > > +}
> > >
> > > I don't see you canceling this work. What if someone unregistered
> > > the device before it runs? You gotta netdev_hold() a reference.
> >
> > We do cancel_work_sync in octep_vf_remove function.
>
> But the device is still registered, so the timeout can happen after you
> cancel but before you unregister.

There is rtnl_lock inside octep_vf_tx_timeout_task (the work task function), which can protect
from unregister_netdev, for such cases (code snippet for quick reference below):

static void octep_vf_tx_timeout_task(struct work_struct *work)
{
struct octep_vf_device *oct = container_of(work, struct octep_vf_device,
tx_timeout_task);
struct net_device *netdev = oct->netdev;

rtnl_lock();
if (netif_running(netdev)) {
octep_vf_stop(netdev);
octep_vf_open(netdev);
}
rtnl_unlock();
}

I hope this takes care of it? Please let me know if my thought process feels wrong. Thanks!

> > > > +static int __init octep_vf_init_module(void)
> > > > +{
> > > > + int ret;
> > > > +
> > > > + pr_info("%s: Loading %s ...\n", OCTEP_VF_DRV_NAME
> OCTEP_VF_DRV_STRING);
> > > > +
> > > > + /* work queue for all deferred tasks */
> > > > + octep_vf_wq =
> > > create_singlethread_workqueue(OCTEP_VF_DRV_NAME);
> > >
> > > Is there a reason this wq has to be single threaded and different than
> > > system queue? All you schedule on it in this series is the reset task.
> >
> > We also schedule the control mailbox task on this workqueue. The
> > workqueue was created with the intention that there could be other
> > driver specific tasks to add in the future. It has been single
> > threaded for now, but we might optimize implementation in the future,
> > although for now as far as to service our control plane this has been
> > enough.
>
> I haven't spotted the mailbox task in this series, if it's not here,
> let's switch to system wq, and only add your own when needed.

Sorry, my bad. The only task in this workqueue for VF driver is the tx timeout currently as I understand.
So, yes we can switch to system workqueue for now, and maybe change if further on such a requirement emerges.
If the previous comment is okay by you, I'll put in this change
as well in the next patch and submit.

Thanks for the review!