Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev

From: Jakub Kicinski
Date: Wed Dec 11 2019 - 18:50:05 EST


On Wed, 11 Dec 2019 23:25:09 +0000, Yuval Avnery wrote:
> > -----Original Message-----
> > From: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
> > Sent: Wednesday, December 11, 2019 2:24 PM
> > To: Yuval Avnery <yuvalav@xxxxxxxxxxxx>
> > Cc: Jiri Pirko <jiri@xxxxxxxxxxxx>; davem@xxxxxxxxxxxxx;
> > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev
> >
> > On Wed, 11 Dec 2019 19:57:34 +0000, Yuval Avnery wrote:
> > > > -----Original Message-----
> > > > From: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
> > > > Sent: Wednesday, December 11, 2019 11:16 AM
> > > > To: Yuval Avnery <yuvalav@xxxxxxxxxxxx>
> > > > Cc: Jiri Pirko <jiri@xxxxxxxxxxxx>; davem@xxxxxxxxxxxxx;
> > > > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > > Subject: Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev
> > > >
> > > > On Wed, 11 Dec 2019 18:19:56 +0000, Yuval Avnery wrote:
> > > > > > On Wed, 11 Dec 2019 04:58:53 +0200, Yuval Avnery wrote:
> > > > > > > Currently there is no limit to the number of VFs netdevsim can enable.
> > > > > > > In a real systems this value exist and used by driver.
> > > > > > > Fore example, Some features might need to consider this value
> > > > > > > when allocating memory.
> > > > > >
> > > > > > Thanks for the patch!
> > > > > >
> > > > > > Can you shed a little bit more light on where it pops up? Just
> > > > > > for my
> > > > curiosity?
> > > > >
> > > > > Yes, like we described in the subdev threads.
> > > > > User should be able to configure some attributes before the VF was
> > > > enabled.
> > > > > So all those (persistent) VF attributes should be available for
> > > > > query and configuration before VF was enabled.
> > > > > The driver can allocate an array according to max_vfs to hold all
> > > > > that data, like we do here in" vfconfigs".
> > > >
> > > > I was after more practical reasoning, are you writing some tests for
> > > > subdev stuff that will depend on this change? :)
> > >
> > > Yes we are writing tests for subdev with this.
> >
> > Okay, please post v2 together with the tests. We don't accept netdevsim
> > features without tests any more.
>
> I think the only test I can currently write is the enable SR-IOV max_vfs enforcement.
> Because subdev is not in yet.
> Will that be good enough?

It'd be good to test some netdev API rather than just the enforcement
itself which is entirely in netdevsim, I think.

So max_vfs enforcement plus checking that ip link lists the correct
number of entries (and perhaps the entries are in reset state after
enable) would do IMO.

> > > This is the way mlx5 works.. is that practical enough?
> > >
> > > > > > > Signed-off-by: Yuval Avnery <yuvalav@xxxxxxxxxxxx>
> > > > > > > Acked-by: Jiri Pirko <jiri@xxxxxxxxxxxx>
> > > > > > >
> > > > > > > diff --git a/drivers/net/netdevsim/bus.c
> > > > > > > b/drivers/net/netdevsim/bus.c index 6aeed0c600f8..f1a0171080cb
> > > > > > > 100644
> > > > > > > --- a/drivers/net/netdevsim/bus.c
> > > > > > > +++ b/drivers/net/netdevsim/bus.c
> > > > > > > @@ -26,9 +26,9 @@ static struct nsim_bus_dev
> > > > > > > *to_nsim_bus_dev(struct device *dev) static int
> > > > > > > nsim_bus_dev_vfs_enable(struct nsim_bus_dev
> > > > > > *nsim_bus_dev,
> > > > > > > unsigned int num_vfs)
> > > > > > > {
> > > > > > > - nsim_bus_dev->vfconfigs = kcalloc(num_vfs,
> > > > > > > - sizeof(struct
> > nsim_vf_config),
> > > > > > > - GFP_KERNEL);
> > > > > >
> > > > > > You're changing the semantics of the enable/disable as well now.
> > > > > > The old values used to be wiped when SR-IOV is disabled, now
> > > > > > they will be retained across disable/enable pair.
> > > > > >
> > > > > > I think it'd be better if that wasn't the case. Users may expect
> > > > > > a system to be in the same state after they enable SR-IOV,
> > > > > > regardless if someone else used SR-IOV since last reboot.
> > > > >
> > > > > Right,
> > > > > But some values should retain across enable/disable, for example
> > > > > MAC
> > > > address which is persistent.
> > > > > So maybe we need to retain some values, while resetting others on
> > > > disable?
> > > > > Would that work?
> > > >
> > > > Mmm. That is a good question. For all practical purposes SR-IOV used
> > > > to be local to the host that enables it until Smart/middle box NICs
> > emerged.
> > > >
> > > > Perhaps the best way forward would be to reset the config that was
> > > > set via legacy APIs and keep only the MACs provisioned via persistent
> > devlink API?
> > > >
> > > > So for now we'd memset, and once devlink API lands reset selectively?
> > >
> > > Legacy is also persistent.
> > > Currently when you set mac address with "ip link vf set mac" it is persistent
> > (at least in mlx5).
> >
> > "Currently in mlx5" - maybe, but this is netdevsim. Currently it clears the
> > config on re-enable which I believe to be preferable as explained before.
> >
> > > But ip link only exposes enabled VFS, so driver on VF has to reload to
> > acquire this MAC.
> > > With devlink subdev it will be possible to set the MAC before VF was
> > enabled.
> >
> > Yup, sure. As I said, once subdev is implemented, we will treat the addresses
> > set by it differently. Those are inherently persistent or rather their life time is
> > independent of just the SR-IOV host.
>
> Ok, got it.
> I am just wondering how this works when you have "ip link" and devlink setting the MAC independently.
> Will they show the same MAC?
> Or ip link will show the non-persistent MAC And devlink the persistent?

My knee jerk reaction is that we should populate the values to those set
via devlink upon SR-IOV enable, but then if user overwrites those values
that's their problem.

Sort of mirror how VF MAC addrs work, just a level deeper. The VF
defaults to the MAC addr provided by the PF after reset, but it can
change it to something else (things may stop working because spoof
check etc. will drop all its frames, but nothing stops the VF in legacy
HW from writing its MAC addr register).

IOW the devlink addr is the default/provisioned addr, not necessarily
the addr the PF has set _now_.

Other options I guess are (a) reject the changes of the address from
the PF once devlink has set a value; (b) provide some device->control
CPU notifier which can ack/reject a request from the PF to change
devlink's value..?

You guys posted the devlink patches a while ago, what was your
implementation doing?