Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers
From: Stephen Hemminger
Date: Fri Dec 09 2016 - 13:20:43 EST
On Fri, 9 Dec 2016 08:31:22 +0100
Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Dec 09, 2016 at 12:05:53AM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > > Sent: Thursday, December 8, 2016 7:56 AM
> > > To: KY Srinivasan <kys@xxxxxxxxxxxxx>
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > olaf@xxxxxxxxx; apw@xxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
> > > jasowang@xxxxxxxxxx; leann.ogasawara@xxxxxxxxxxxxx;
> > > bjorn.helgaas@xxxxxxxxx; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial
> > > numbers
> > >
> > > On Thu, Dec 08, 2016 at 12:33:43AM -0800, kys@xxxxxxxxxxxxxxxxxxxxxx
> > > wrote:
> > > > From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > >
> > > > We currently use MAC address to match VF and synthetic NICs. Hyper-V
> > > > provides a serial number to both devices for this purpose. This patch
> > > > implements the matching based on VF serial numbers. This is the way
> > > > specified by the protocol and more reliable.
> > > >
> > > > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > > > ---
> > > > drivers/net/hyperv/netvsc_drv.c | 55
> > > ++++++++++++++++++++++++++++++++++++---
> > > > 1 files changed, 51 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > > b/drivers/net/hyperv/netvsc_drv.c
> > > > index 9522763..c5778cf 100644
> > > > --- a/drivers/net/hyperv/netvsc_drv.c
> > > > +++ b/drivers/net/hyperv/netvsc_drv.c
> > > > @@ -1165,9 +1165,10 @@ static void netvsc_free_netdev(struct
> > > net_device *netdev)
> > > > free_netdev(netdev);
> > > > }
> > > >
> > > > -static struct net_device *get_netvsc_bymac(const u8 *mac)
> > > > +static struct net_device *get_netvsc_byvfser(u32 vfser)
> > > > {
> > > > struct net_device *dev;
> > > > + struct net_device_context *ndev_ctx;
> > > >
> > > > ASSERT_RTNL();
> > > >
> > > > @@ -1175,7 +1176,8 @@ static void netvsc_free_netdev(struct net_device
> > > *netdev)
> > > > if (dev->netdev_ops != &device_ops)
> > > > continue; /* not a netvsc device */
> > > >
> > > > - if (ether_addr_equal(mac, dev->perm_addr))
> > > > + ndev_ctx = netdev_priv(dev);
> > > > + if (ndev_ctx->vf_serial == vfser)
> > > > return dev;
> > > > }
> > > >
> > > > @@ -1205,21 +1207,66 @@ static void netvsc_free_netdev(struct
> > > net_device *netdev)
> > > > return NULL;
> > > > }
> > > >
> > > > +static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> > > > +{
> > > > + struct device *dev;
> > > > + struct hv_device *hdev;
> > > > + struct hv_pcibus_device *hbus = NULL;
> > > > + struct list_head *iter;
> > > > + struct hv_pci_dev *hpdev;
> > > > + unsigned long flags;
> > > > + u32 vfser = 0;
> > > > + u32 count = 0;
> > > > +
> > > > + for (dev = &vf_netdev->dev; dev; dev = dev->parent) {
> > >
> > > You are going to walk the whole device tree backwards? That's crazy.
> > > And foolish. And racy and broken (what happens if the tree changes
> > > while you do this?) Where is the lock being grabbed while this happens?
> > > What about reference counts? Do you see other drivers ever doing this
> > > (if you do, point them out and I'll go yell at them too...)
> >
> > Greg,
> >
> > We are registering for netdev events. Coming into this function, the caller
> > guarantees that the list of netdevs does not change - we assert this on entry:
> > ASSERT_RTNL(). We are only walking up the device tree for the netdevs whose
> > state change is being notified to us - the device tree being walked here is limited to
> > netdevs under question.
>
> But a netdev is a child of some type of "real" device, and you are now
> walking the tree of all devices up to the "root" parent device, which
> means you will hit PCI bridges, USB controllers, and all sorts of fun
> things if you are a child of those types of devices.
>
> And can't you tell if the netdev for this event, really is "your"
> netdev? Or are you getting called this for "all" netdevs? Sorry, I
> don't know this api, any pointers to it would be appreciated.
>
> > We have a reference to the device and we know the device is not going away. Is it not
> > safe to dereference the parent pointer - after all the child has taken a reference on
> > the parent as part of device_add() call.
>
> It might be, and might not be. There's a reason you don't see this
> pattern anywhere in the kernel because of this...
>
> > > > + if (!dev_is_vmbus(dev))
> > > > + continue;
> > >
> > > Ick.
> > >
> > > Why isn't your parent pointer a vmbus device all the time? How could
> > > you get burried down in the device hierarchy when you are the driver for
> > > a specific bus type in the first place? How could this function ever be
> > > called for a device that is NOT of this type?
> >
> > We get notified when state changes on any of the netdev devices in the system.
> > Not all netdevs in the system belong to vmbus. Consider for instance the
> > emulated NIC that can be configured. This is an emulated PCI NIC. We are only
> > interested in netdevs that correspond to the VF instance that we are interested in.
>
> Can you "know" this is your netdev by some other way than having to walk
> the device tree? Name? local device type? Something else? This seems
> like an odd api in that everyone would have to do gyrations like this in
> order to determine if the netdev is "theirs" or not...
The scenario is SR-IOV on Hyper-V. In the case of VF device, the host hands the
guest OS a PCI device for the virtual function device. The VF device is placed
on a special synthetic PCI bus (ie not part of the other buses on the system).
The VF device also has a synthetic network interface (netvsc) which lives
on VMBUS. This code is about managing the interaction between the two.
The association between VF and synthetic NIC is done in response to the
VF network device being registered. Initial version was based on MAC address
which is the same. Later refinement used permanent MAC address to
avoid bugs if MAC address changed. This version is to use serial number
instead which is safer than MAC address.
The code to walk up/down maybe not be needed to find serial number.
Perhaps a more direct single set of conditions is possible?
Something like:
In pci-hyperv.c
u32 hv_pcifront_get_serial(struct pci_bus *bus, unsigned int devfn)
{
struct hv_pcibus_device *hbus
= container_of(bus->sysdata,
struct hv_pcibus_device, sysdata);
struct hf_pci_dev *hpdev;
u32 serial;
hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn));
if (!hpdev)
return 0;
serial = hpdev->devs.ser;
put_pcichild(hpdev, hv_pcidev_ref_by_slot);
return serial;
}
In netvsc_drv.c
static u32 netvsc_get_vfser(struct net_device *vf_netdev)
{
struct device *dev = vf_netdev->dev.parent;
struct pci_dev *pdev;
u32 wslot;
if (!dev || !dev_is_pci(dev))
return 0;
pdev = container_of(dev, struct pci_device, dev);
return hv_pcifront_get_serial(pdev->bus, pdev->devfn);
}
P.S: it would be good to be able to get win_slot out through sysfs as
well for systemd/udev.