Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device

From: Leon Romanovsky
Date: Mon Oct 25 2021 - 01:35:07 EST


On Sun, Oct 24, 2021 at 01:48:25PM +0300, Ido Schimmel wrote:
> On Sun, Oct 24, 2021 at 12:54:52PM +0300, Leon Romanovsky wrote:
> > On Sun, Oct 24, 2021 at 12:05:12PM +0300, Ido Schimmel wrote:
> > > On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@xxxxxxxxxx>
> > > >
> > > > Align netdevsim to be like all other physical devices that register and
> > > > unregister devlink traps during their probe and removal respectively.
> > >
> > > No, this is incorrect. Out of the three drivers that support both reload
> > > and traps, both netdevsim and mlxsw unregister the traps during reload.
> > > Here is another report from syzkaller about mlxsw [1].
> >
> > Sorry, I overlooked it.
> >
> > >
> > > Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> > > policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> > > trap group notifications").
> >
> > However, before we rush and revert commit, can you please explain why
> > current behavior to reregister traps on reload is correct?
> >
> > I think that you are not changing traps during reload, so traps before
> > reload will be the same as after reload, am I right?
>
> During reload we tear down the entire driver and load it again. As part
> of the reload_down() operation we tear down the various objects from
> both devlink and the device (e.g., shared buffer, ports, traps, etc.).
> As part of the reload_up() operation we issue a device reset and
> register everything back.

This is an implementation which is arguably questionable and pinpoints
problem with devlink reload. It mixes different SW layers into one big
mess which I tried to untangle.

The devlink "feature" that driver reregisters itself again during execution
of other user-visible devlink command can't be right design.

>
> While the list of objects doesn't change, their properties (e.g., shared
> buffer size, trap action, policer rate) do change back to the default
> after reload and we cannot go back on that as it's a user-visible
> change.

I don't propose to go back, just prefer to see fixed mlxsw that
shouldn't touch already created and registered objects from net/core/devlink.c.

All reset-to-default should be performed internally to the driver
without any need to devlink_*_register() again, so we will be able to
clean rest devlink notifications.

So at least for the netdevsim, this change looks like the correct one,
while mlxsw should be fixed next.

Thanks