Re: [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().

From: Stanislav Fomichev
Date: Sat Mar 08 2025 - 18:51:49 EST


On 03/08, Stanislav Fomichev wrote:
> On 03/08, Jakub Kicinski wrote:
> > On Sat, 8 Mar 2025 13:18:13 -0800 Jakub Kicinski wrote:
> > > On Sun, 9 Mar 2025 05:37:18 +0900 Kohei Enju wrote:
> > > > Both netdev_lock() and netdev_lock_ops() are called before
> > > > list_netdevice() in register_netdevice().
> > > > No other context can access the struct net_device, so we don't need these
> > > > locks in this context.
> > >
> > > Doesn't sysfs get registered earlier?
> > > I'm afraid not being able to take the lock from the registration
> > > path ties our hands too much. Maybe we need to make a more serious
> > > attempt at letting the caller take the lock?
> >
> > Looking closer at the report - we are violating the contract that only
> > drivers which opted in get their ops called under the instance lock.
> > iavf had a similar problem but it had to opt in. WiFi doesn't.
> >
> > Maybe we can bring the address semaphore back?
> > We just need to take it before the ops lock in do_setlink.
> > A bit ugly but would work?
>
> I remember I was having another lockdep circular report with the addr
> sema, but maybe moving it before the ops lock fill fix it not sure.
>
> But coming back to "No other context can access the struct net_device,
> so we don't need these locks in this context.". What if we move
> netdev_set_addr_lockdep_class() call down a bit? Right before list_netdevice
> happens. Will it help with the lockdep?

Hmm, netdev_set_addr_lockdep_class is not touching instance lock :-(
But basically do lockdep_set_novalidate_class early and undo it
before list_netdevice...