RE: [PATCH net-next] netdevsim: Add max_vfs to bus_dev

From: Yuval Avnery
Date: Thu Dec 12 2019 - 22:21:09 EST




> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
> Sent: Thursday, December 12, 2019 5:54 PM
> To: Yuval Avnery <yuvalav@xxxxxxxxxxxx>
> Cc: Jiri Pirko <jiri@xxxxxxxxxxxx>; davem@xxxxxxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Andy Gospodarek
> <andy@xxxxxxxxxxxxx>
> Subject: Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev
>
> On Thu, 12 Dec 2019 20:44:31 +0000, Yuval Avnery wrote:
> > > -----Original Message-----
> > > From: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
> > > Sent: Thursday, December 12, 2019 10:25 AM
> > > To: Yuval Avnery <yuvalav@xxxxxxxxxxxx>
> > > Cc: Jiri Pirko <jiri@xxxxxxxxxxxx>; davem@xxxxxxxxxxxxx;
> > > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Andy
> > > Gospodarek <andy@xxxxxxxxxxxxx>
> > > Subject: Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev
> > >
> > > On Thu, 12 Dec 2019 05:11:12 +0000, Yuval Avnery wrote:
> > > > > > > Okay, please post v2 together with the tests. We don't
> > > > > > > accept netdevsim features without tests any more.
> > > > > >
> > > > > > I think the only test I can currently write is the enable
> > > > > > SR-IOV max_vfs enforcement. Because subdev is not in yet.
> > > > > > Will that be good enough?
> > > > >
> > > > > It'd be good to test some netdev API rather than just the
> > > > > enforcement itself which is entirely in netdevsim, I think.
> > > > >
> > > > > So max_vfs enforcement plus checking that ip link lists the
> > > > > correct number of entries (and perhaps the entries are in reset
> > > > > state after
> > > > > enable) would do IMO.
> > > >
> > > > Ok, but this is possible regardless of my patch (to enable vfs).
> > >
> > > I was being lenient :) Your patch is only really needed when the
> > > devlink API lands, since devlink will display all max VFs not enabled.
> > >
> > > > > My knee jerk reaction is that we should populate the values to
> > > > > those set via devlink upon SR-IOV enable, but then if user
> > > > > overwrites those values that's their problem.
> > > > >
> > > > > Sort of mirror how VF MAC addrs work, just a level deeper. The
> > > > > VF defaults to the MAC addr provided by the PF after reset, but
> > > > > it can change it to something else (things may stop working
> > > > > because spoof check etc. will drop all its frames, but nothing
> > > > > stops the VF in legacy HW from writing its MAC addr register).
> > > > >
> > > > > IOW the devlink addr is the default/provisioned addr, not
> > > > > necessarily the addr the PF has set _now_.
> > > > >
> > > > > Other options I guess are (a) reject the changes of the address
> > > > > from the PF once devlink has set a value; (b) provide some
> > > > > device->control CPU notifier which can ack/reject a request from
> > > > > device->the PF
> > > to change devlink's value..?
> > > > >
> > > > > You guys posted the devlink patches a while ago, what was your
> > > > > implementation doing?
> > > >
> > > > devlink simply calls the driver with set or get.
> > > > It is up to the vendor driver/HW if to make this address persistent or
> not.
> > > > The address is not saved in the devlink layer.
> > >
> > > It'd be preferable for the behaviour of the kernel API to not be
> > > vendor specific. That defeats the purpose of having an operating
> > > system as a HW abstraction layer. SR-IOV devices of today are so FW
> > > heavy we can make them behave whatever way we choose makes most
> sense.
> > >
> > > > The MAC address in mlx5 is stored in the HW and persistent (until
> > > > PF
> > > > reset) , whether it is set by devlink or ip link.
> > >
> > > Okay, let's see if I understand. The devlink and ip link interfaces
> > > basically do the same thing but one reaches from control CPU and the
> > > other one from the SR-IOV host? And on SR-IOV host reset the addresses
> go back to 00:00..
> > > i.e. any?
> >
> > No,
> > This will work only in non-SmartNic mode, when e-switch manager is on
> > the host, MAC will be accessible through devlink and legacy tools..
> > For smartnic, only devlink from the embedded OS will work. Ip link from
> the host will not work.
>
> I see, is this a more fine grained capability or all or nothing for SR-IOV control?
> I'd think that if the SmartNIC's eswitch just encapsulates all the frames into a
> L4 tunnel it shouldn't care about L2 addresses.

People keep saying that, but there are customers who wants this capability :)

>
> > > What happens if the SR-IOV host changes the MAC? Is it used by HW or
> > > is the MAC provisioned by the control CPU used for things like spoof
> check?
> >
> > Host shouldn't have privileges to do it.
> > If it does, then it's under the host ownership (like in non-smartnic mode).
>
> I see so the MAC is fixed from bare metal host's PoV? And it has to be set

Yes

> through some high level cloud API (for live migration etc)?
> Do existing software stacks like libvirt handle not being able to set the MAC
> happily?

I am not sure what you mean.
What we are talking about here is the E-switch manager setting a MAC to another VF.
When the VF driver loads it will query this MAC from the NIC. This is the way
It works today with "ip link set _vf_ mac"

Or in other words we are replacing "ip link set _vf_ mac" and not "ip link set address"
So that it can work from the SmartNic embedded system.
There is nothing really new here, ip link will not work from a SmartNic,
this is why need devlink subdev.

Hope that answers you question.

>
> > > Does the control CPU get a notification for SR-IOV host reset? In
> > > that case the control CPU driver could restore the MAC addr.
> >
> > Yes, but this is irrelevant here, the MAC is already stored in HW/FW.
> > The MAC will reset only when the E-switch manager (on the control CPU)
> reset.
> >
> > > > So from what I understand, we have the freedom to choose how
> > > > netdevsim behave in this case, which means non-persistent is ok.
> > >
> > > To be clear - by persistent I meant that it survives the SR-IOV
> > > host's resets, not necessarily written to NVRAM of any sort.
> >
> > Yes, this is my view as well.
> > For non-smartnic it will survive VF disable/enable.
> > MAC is not stored on NVRAM, it will disappear once the driver on the
> control CPU resets.
> >
> > > I'd like to see netdevsim to also serve as sort of a reference model
> > > for device behaviour. Vendors who are not first to implement a
> > > feature always complain that there is no documentation on how things
> should work.
> >
> > Yes, this is a good idea.
> > But it seems we are always held back by legacy tools with no well-defined
> behavior.