Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev
From: Jakub Kicinski
Date: Thu Dec 12 2019 - 20:54:25 EST
On Thu, 12 Dec 2019 20:44:31 +0000, Yuval Avnery wrote:
> > -----Original Message-----
> > From: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
> > Sent: Thursday, December 12, 2019 10:25 AM
> > To: Yuval Avnery <yuvalav@xxxxxxxxxxxx>
> > Cc: Jiri Pirko <jiri@xxxxxxxxxxxx>; davem@xxxxxxxxxxxxx;
> > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Andy Gospodarek
> > <andy@xxxxxxxxxxxxx>
> > Subject: Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev
> > On Thu, 12 Dec 2019 05:11:12 +0000, Yuval Avnery wrote:
> > > > > > Okay, please post v2 together with the tests. We don't accept
> > > > > > netdevsim features without tests any more.
> > > > >
> > > > > I think the only test I can currently write is the enable SR-IOV
> > > > > max_vfs enforcement. Because subdev is not in yet.
> > > > > Will that be good enough?
> > > >
> > > > It'd be good to test some netdev API rather than just the
> > > > enforcement itself which is entirely in netdevsim, I think.
> > > >
> > > > So max_vfs enforcement plus checking that ip link lists the correct
> > > > number of entries (and perhaps the entries are in reset state after
> > > > enable) would do IMO.
> > >
> > > Ok, but this is possible regardless of my patch (to enable vfs).
> > I was being lenient :) Your patch is only really needed when the devlink API
> > lands, since devlink will display all max VFs not enabled.
> > > > My knee jerk reaction is that we should populate the values to those
> > > > set via devlink upon SR-IOV enable, but then if user overwrites
> > > > those values that's their problem.
> > > >
> > > > Sort of mirror how VF MAC addrs work, just a level deeper. The VF
> > > > defaults to the MAC addr provided by the PF after reset, but it can
> > > > change it to something else (things may stop working because spoof
> > > > check etc. will drop all its frames, but nothing stops the VF in
> > > > legacy HW from writing its MAC addr register).
> > > >
> > > > IOW the devlink addr is the default/provisioned addr, not
> > > > necessarily the addr the PF has set _now_.
> > > >
> > > > Other options I guess are (a) reject the changes of the address from
> > > > the PF once devlink has set a value; (b) provide some
> > > > device->control CPU notifier which can ack/reject a request from the PF
> > to change devlink's value..?
> > > >
> > > > You guys posted the devlink patches a while ago, what was your
> > > > implementation doing?
> > >
> > > devlink simply calls the driver with set or get.
> > > It is up to the vendor driver/HW if to make this address persistent or not.
> > > The address is not saved in the devlink layer.
> > It'd be preferable for the behaviour of the kernel API to not be vendor
> > specific. That defeats the purpose of having an operating system as a HW
> > abstraction layer. SR-IOV devices of today are so FW heavy we can make
> > them behave whatever way we choose makes most sense.
> > > The MAC address in mlx5 is stored in the HW and persistent (until PF
> > > reset) , whether it is set by devlink or ip link.
> > Okay, let's see if I understand. The devlink and ip link interfaces basically do
> > the same thing but one reaches from control CPU and the other one from
> > the SR-IOV host? And on SR-IOV host reset the addresses go back to 00:00..
> > i.e. any?
> This will work only in non-SmartNic mode, when e-switch manager is on the host,
> MAC will be accessible through devlink and legacy tools..
> For smartnic, only devlink from the embedded OS will work. Ip link from the host will not work.
I see, is this a more fine grained capability or all or nothing for
SR-IOV control? I'd think that if the SmartNIC's eswitch just
encapsulates all the frames into a L4 tunnel it shouldn't care about L2
> > What happens if the SR-IOV host changes the MAC? Is it used by HW or is the
> > MAC provisioned by the control CPU used for things like spoof check?
> Host shouldn't have privileges to do it.
> If it does, then it's under the host ownership (like in non-smartnic mode).
I see so the MAC is fixed from bare metal host's PoV? And it has to be
set through some high level cloud API (for live migration etc)?
Do existing software stacks like libvirt handle not being able to set
the MAC happily?
> > Does the control CPU get a notification for SR-IOV host reset? In that case
> > the control CPU driver could restore the MAC addr.
> Yes, but this is irrelevant here, the MAC is already stored in HW/FW.
> The MAC will reset only when the E-switch manager (on the control CPU) reset.
> > > So from what I understand, we have the freedom to choose how netdevsim
> > > behave in this case, which means non-persistent is ok.
> > To be clear - by persistent I meant that it survives the SR-IOV host's resets,
> > not necessarily written to NVRAM of any sort.
> Yes, this is my view as well.
> For non-smartnic it will survive VF disable/enable.
> MAC is not stored on NVRAM, it will disappear once the driver on the control CPU resets.
> > I'd like to see netdevsim to also serve as sort of a reference model for device
> > behaviour. Vendors who are not first to implement a feature always
> > complain that there is no documentation on how things should work.
> Yes, this is a good idea.
> But it seems we are always held back by legacy tools with no well-defined behavior.