Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
From: Leon Romanovsky
Date: Wed Oct 27 2021 - 05:43:59 EST
On Wed, Oct 27, 2021 at 01:46:44AM -0700, Edwin Peer wrote:
> On Tue, Oct 26, 2021 at 11:43 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>
> > In our case, the eth driver is part of mlx5_core module, so at the
> > device creation phase that module is already loaded and driver/core
> > will try to autoprobe it.
>
> > However, the last step is not always performed and controlled by the
> > userspace. Users can disable driver autoprobe and bind manually. This
> > is pretty standard practice in the SR-IOV or VFIO modes.
>
> While you say the netdev will not necessarily be bound, that still
> sounds like the netdev will indeed be presented to user space before
> devlink_register() when it is auto-probed?
And it is ok, there is no binding between netdev and devlink. They are
independent. For example, IB cards have devlink through mlx5_core, but
without netdev at all.
>
> > This is why devlink has monitor mode where you can see devlink device
> > addition and removal. It is user space job to check that device is
> > ready.
>
> Isn't it more a question of what existing user space _does_ rather
> than what user space _should_ do?
No, because races of kernel vs. user space always existed and every time,
the answer was the same - it is user space job to avoid race. Kernel
needs to protect itself and we successfully did it here.
Latest example of such programming model can be seen in VFIO too:
https://lore.kernel.org/kvm/20211026090605.91646-1-yishaih@xxxxxxxxxx/T/#m89ef326da9405dcefe482bd5a6a54aff7e0b90bc
"Yes. If the userspace races ioctls they get a deserved mess.
This race exists no matter what we do, as soon as the unlock happens a
racing reset ioctl could run in during the system call exit path.
The purpose of the locking is to protect the kernel from hostile
userspace, not to allow userspace to execute concurrent ioctl's in a
sensible way."
>
> > > This isn't about kernel API. This is precisely about existing user
> > > space that expects devlink to work immediately after the netdev
> > > appears.
> >
> > Can you please share open source project that has such assumption?
>
> I'm no python expert, but it looks like
> https://github.com/openstack-charmers/mlnx-switchdev-mode/ might.
No, it is not.
> We've certainly had implicit user space assumptions trip over
> registration order before, hence the change we made in January last
> year to move devlink registration earlier. Granted, upon deeper
> analysis, that specific case pertained to phys port name via sysfs,
> which technically only needs port attrs via ndo_get_devlink_port, not
> devlink_register(). That said, I'm certainly not confident that there
> are no other existing users that might expect to be able to invoke
> devlink in ifup scripts.
It is not what your change did, you moved devlink ops registration logic
to be before register_netdev(), this is continue to be in upstream code.
We are registering ops in devlink_alloc() and that function stays to be
at the same place.
>
> > > What do you suggest instead?
> >
> > Fix your test respect devlink notifications and don't ignore them.
>
> That's not very helpful. The test case does what the user in the field
> did to break it. We can't assume users have always been using our APIs
> the way we intended.
There is no API here.
Thanks
>
> Regards,
> Edwin Peer