Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device

From: Ido Schimmel
Date: Tue Oct 26 2021 - 10:09:59 EST


On Tue, Oct 26, 2021 at 10:18:12AM +0300, Leon Romanovsky wrote:
> On Tue, Oct 26, 2021 at 09:51:13AM +0300, Ido Schimmel wrote:
>
> <...>
>
> > >
> > > Can you please explain why is it so important to touch devlink SW
> > > objects, reallocate them again and again on every reload in mlxsw?
> >
> > Because that's how reload was defined and implemented. A complete
> > reload. We are not changing the semantics 4 years later.
>
> Please put your emotions aside and explain me technically why are you
> must to do it?

Already did. The current semantics are "devlink-reload provides
mechanism to reinit driver entities, applying devlink-params and
devlink-resources new values. It also provides mechanism to activate
firmware."

And this is exactly what netdevsim and mlxsw are doing. Driver entities
are re-initialized. Your patch breaks that as entities are not
re-initialized, which results in user space breakage. You simply cannot
introduce such regressions.

>
> The proposed semantics was broken for last 4 years, it can even seen as
> dead on arrival,

Again with the bombastic statements. It was "dead on arrival" like the
notifications were "impossible"?

> because it never worked for us in real production.

Who is "us"? mlx5 that apparently decided to do its own thing?

We are using reload in mlxsw on a daily basis and users are using it to
re-partition ASIC resources and activate firmware. There are tests over
netdevsim implementation that anyone can run for testing purposes. We
also made sure to integrate it into syzkaller:

https://github.com/google/syzkaller/commit/5b49e1f605a770e8f8fcdcbd1a8ff85591fc0c8e
https://github.com/google/syzkaller/commit/04ca72cd45348daab9d896bbec8ea4c2d13455ac
https://github.com/google/syzkaller/commit/6930bbef3b671ae21f74007f9e59efb9b236b93f
https://github.com/google/syzkaller/commit/d45a4d69d83f40579e74fb561e1583db1be0e294
https://github.com/google/syzkaller/commit/510951950dc0ee69cfdaf746061d3dbe31b49fd8

Which is why the regressions you introduced were discovered so quickly.

>
> So I'm fixing bugs without relation to when they were introduced.

We all do

>
> For example, this fix from Jiri [1] for basic design flow was merged almost
> two years later after devlink reload was introduced [2], or this patch from
> Parav [3] that fixed an issue introduced year before [4].

What is your point? That code has bugs?

By now I have spent more time arguing with you than you spent testing
your patches and it's clear this discussion is not going anywhere.

Are you going to send a revert or I will? This is the fourth time I'm
asking you.