Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device

From: Jakub Kicinski
Date: Wed Oct 27 2021 - 10:18:23 EST


On Wed, 27 Oct 2021 08:56:45 +0300 Leon Romanovsky wrote:
> On Tue, Oct 26, 2021 at 12:56:02PM -0700, Jakub Kicinski wrote:
> > On Tue, 26 Oct 2021 22:30:23 +0300 Leon Romanovsky wrote:
> > > No problem, I'll send a revert now, but what is your take on the direction?
> >
> > I haven't put in the time to understand the detail so I was hoping not
> > to pass judgment on the direction. My likely unfounded feeling is that
> > reshuffling ordering is not going to fix what is fundamentally a
> > locking issue. Driver has internal locks it needs to hold both inside
> > devlink callbacks and when registering devlink objects. We would solve
> > a lot of the problems if those were one single lock instead of two.
> > At least that's my recollection from the times I was actually writing
> > driver code...
>
> Exactly, and this is what reshuffling of registrations does. It allows us
> to actually reduce number of locks to bare minimum, so at least creation
> and deletion of devlink objects will be locks free.

That's not what I meant. I meant devlink should call in to take
driver's lock or more likely driver should use the devlink instance
mutex instead of creating its own. Most of the devlink helpers (with
minor exceptions like alloc) should just assert that devlink instance
lock is already held by the driver when called.

> Latest changes already solved devlink reload issues for mlx5 eth side
> and it is deadlock and lockdep free now. We still have deadlocks with
> our IB part, where we obligated to hold pernet lock during registering
> to net notifiers, but it is different discussion.
>
> > > IMHO, the mlxsw layering should be fixed. All this recursive devlink re-entry
> > > looks horrible and adds unneeded complexity.
> >
> > If you're asking about mlxsw or bnxt in particular I wouldn't say what
> > they do is wrong until we can point out bugs.
>
> I'm talking about mlxsw and pointed to the reentry to devlink over and over.

To me "pointing to re-entry" read like breaking the new model you have
in mind, not actual bug/race/deadlock etc. If that's not the case the
explanation flew over my head :)