RE: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers
From: Haiyang Zhang
Date: Fri Dec 09 2016 - 17:51:05 EST
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@xxxxxxxxxxxxxxxxxx]
> Sent: Friday, December 9, 2016 5:05 PM
> To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> Cc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>; KY Srinivasan
> <kys@xxxxxxxxxxxxx>; olaf@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> bjorn.helgaas@xxxxxxxxx; apw@xxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx;
> leann.ogasawara@xxxxxxxxxxxxx; jasowang@xxxxxxxxxx
> Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> serial numbers
>
> On Fri, 9 Dec 2016 21:53:49 +0000
> Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> wrote:
>
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:stephen@xxxxxxxxxxxxxxxxxx]
> > > Sent: Friday, December 9, 2016 4:45 PM
> > > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > Cc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>; KY Srinivasan
> > > <kys@xxxxxxxxxxxxx>; olaf@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > bjorn.helgaas@xxxxxxxxx; apw@xxxxxxxxxxxxx;
> devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > leann.ogasawara@xxxxxxxxxxxxx; jasowang@xxxxxxxxxx
> > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> > > serial numbers
> > >
> > > On Fri, 9 Dec 2016 21:31:25 +0000
> > > Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Stephen Hemminger [mailto:stephen@xxxxxxxxxxxxxxxxxx]
> > > > > Sent: Friday, December 9, 2016 3:30 PM
> > > > > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > > > Cc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>; KY Srinivasan
> > > > > <kys@xxxxxxxxxxxxx>; olaf@xxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx;
> > > > > bjorn.helgaas@xxxxxxxxx; apw@xxxxxxxxxxxxx;
> > > devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > > > leann.ogasawara@xxxxxxxxxxxxx; jasowang@xxxxxxxxxx
> > > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based
> on
> > > > > serial numbers
> > > > >
> > > > > On Fri, 9 Dec 2016 20:09:49 +0000
> > > > > Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Stephen Hemminger [mailto:stephen@xxxxxxxxxxxxxxxxxx]
> > > > > > > Sent: Friday, December 9, 2016 1:21 PM
> > > > > > > To: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > > > > > Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; olaf@xxxxxxxxx;
> Haiyang
> > > Zhang
> > > > > > > <haiyangz@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> > > > > > > bjorn.helgaas@xxxxxxxxx; apw@xxxxxxxxxxxxx;
> > > > > devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > > > > > leann.ogasawara@xxxxxxxxxxxxx; jasowang@xxxxxxxxxx
> > > > > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching
> based
> > > on
> > > > > > > serial numbers
> > > > > > >
> > > > > > > On Fri, 9 Dec 2016 08:31:22 +0100
> > > > > > > Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > On Fri, Dec 09, 2016 at 12:05:53AM +0000, KY Srinivasan
> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > > > > > > > > > Sent: Thursday, December 8, 2016 7:56 AM
> > > > > > > > > > To: KY Srinivasan <kys@xxxxxxxxxxxxx>
> > > > > > > > > > Cc: linux-kernel@xxxxxxxxxxxxxxx;
> > > devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > > > > > > > > olaf@xxxxxxxxx; apw@xxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
> > > > > > > > > > jasowang@xxxxxxxxxx; leann.ogasawara@xxxxxxxxxxxxx;
> > > > > > > > > > bjorn.helgaas@xxxxxxxxx; Haiyang Zhang
> > > <haiyangz@xxxxxxxxxxxxx>
> > > > > > > > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF
> matching
> > > > > based on
> > > > > > > serial
> > > > > > > > > > numbers
> > > > > > > > > >
> > > > > > > > > > On Thu, Dec 08, 2016 at 12:33:43AM -0800,
> > > > > > > kys@xxxxxxxxxxxxxxxxxxxxxx
> > > > > > > > > > wrote:
> > > > > > > > > > > From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > > > > > > > > >
> > > > > > > > > > > We currently use MAC address to match VF and
> synthetic
> > > NICs.
> > > > > > > Hyper-V
> > > > > > > > > > > provides a serial number to both devices for this
> > > purpose.
> > > > > This
> > > > > > > patch
> > > > > > > > > > > implements the matching based on VF serial numbers.
> This
> > > is
> > > > > the
> > > > > > > way
> > > > > > > > > > > specified by the protocol and more reliable.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > > > > > > > > > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > > > > > > > > > > ---
> > > > > > > > > > > drivers/net/hyperv/netvsc_drv.c | 55
> > > > > > > > > > ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > 1 files changed, 51 insertions(+), 4 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > > > > > > > > > b/drivers/net/hyperv/netvsc_drv.c
> > > > > > > > > > > index 9522763..c5778cf 100644
> > > > > > > > > > > --- a/drivers/net/hyperv/netvsc_drv.c
> > > > > > > > > > > +++ b/drivers/net/hyperv/netvsc_drv.c
> > > > > > > > > > > @@ -1165,9 +1165,10 @@ static void
> > > netvsc_free_netdev(struct
> > > > > > > > > > net_device *netdev)
> > > > > > > > > > > free_netdev(netdev);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > -static struct net_device *get_netvsc_bymac(const u8
> > > *mac)
> > > > > > > > > > > +static struct net_device *get_netvsc_byvfser(u32
> vfser)
> > > > > > > > > > > {
> > > > > > > > > > > struct net_device *dev;
> > > > > > > > > > > + struct net_device_context *ndev_ctx;
> > > > > > > > > > >
> > > > > > > > > > > ASSERT_RTNL();
> > > > > > > > > > >
> > > > > > > > > > > @@ -1175,7 +1176,8 @@ static void
> > > netvsc_free_netdev(struct
> > > > > > > net_device
> > > > > > > > > > *netdev)
> > > > > > > > > > > if (dev->netdev_ops != &device_ops)
> > > > > > > > > > > continue; /* not a netvsc device */
> > > > > > > > > > >
> > > > > > > > > > > - if (ether_addr_equal(mac, dev->perm_addr))
> > > > > > > > > > > + ndev_ctx = netdev_priv(dev);
> > > > > > > > > > > + if (ndev_ctx->vf_serial == vfser)
> > > > > > > > > > > return dev;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > @@ -1205,21 +1207,66 @@ static void
> > > > > netvsc_free_netdev(struct
> > > > > > > > > > net_device *netdev)
> > > > > > > > > > > return NULL;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > +static u32 netvsc_get_vfser(struct net_device
> > > *vf_netdev)
> > > > > > > > > > > +{
> > > > > > > > > > > + struct device *dev;
> > > > > > > > > > > + struct hv_device *hdev;
> > > > > > > > > > > + struct hv_pcibus_device *hbus = NULL;
> > > > > > > > > > > + struct list_head *iter;
> > > > > > > > > > > + struct hv_pci_dev *hpdev;
> > > > > > > > > > > + unsigned long flags;
> > > > > > > > > > > + u32 vfser = 0;
> > > > > > > > > > > + u32 count = 0;
> > > > > > > > > > > +
> > > > > > > > > > > + for (dev = &vf_netdev->dev; dev; dev = dev->parent)
> > > {
> > > > > > > > > >
> > > > > > > > > > You are going to walk the whole device tree backwards?
> > > That's
> > > > > > > crazy.
> > > > > > > > > > And foolish. And racy and broken (what happens if the
> > > tree
> > > > > > > changes
> > > > > > > > > > while you do this?) Where is the lock being grabbed
> while
> > > > > this
> > > > > > > happens?
> > > > > > > > > > What about reference counts? Do you see other drivers
> > > ever
> > > > > doing
> > > > > > > this
> > > > > > > > > > (if you do, point them out and I'll go yell at them
> too...)
> > > > > > > > >
> > > > > > > > > Greg,
> > > > > > > > >
> > > > > > > > > We are registering for netdev events. Coming into this
> > > function,
> > > > > the
> > > > > > > caller
> > > > > > > > > guarantees that the list of netdevs does not change - we
> > > assert
> > > > > this
> > > > > > > on entry:
> > > > > > > > > ASSERT_RTNL(). We are only walking up the device tree
> for
> > > the
> > > > > > > netdevs whose
> > > > > > > > > state change is being notified to us - the device tree
> being
> > > > > walked
> > > > > > > here is limited to
> > > > > > > > > netdevs under question.
> > > > > > > >
> > > > > > > > But a netdev is a child of some type of "real" device, and
> you
> > > are
> > > > > now
> > > > > > > > walking the tree of all devices up to the "root" parent
> device,
> > > > > which
> > > > > > > > means you will hit PCI bridges, USB controllers, and all
> sorts
> > > of
> > > > > fun
> > > > > > > > things if you are a child of those types of devices.
> > > > > > > >
> > > > > > > > And can't you tell if the netdev for this event, really is
> > > "your"
> > > > > > > > netdev? Or are you getting called this for "all" netdevs?
> > > Sorry,
> > > > > I
> > > > > > > > don't know this api, any pointers to it would be
> appreciated.
> > > > > > > >
> > > > > > > > > We have a reference to the device and we know the device
> is
> > > not
> > > > > > > going away. Is it not
> > > > > > > > > safe to dereference the parent pointer - after all the
> child
> > > has
> > > > > > > taken a reference on
> > > > > > > > > the parent as part of device_add() call.
> > > > > > > >
> > > > > > > > It might be, and might not be. There's a reason you don't
> see
> > > > > this
> > > > > > > > pattern anywhere in the kernel because of this...
> > > > > > > >
> > > > > > > > > > > + if (!dev_is_vmbus(dev))
> > > > > > > > > > > + continue;
> > > > > > > > > >
> > > > > > > > > > Ick.
> > > > > > > > > >
> > > > > > > > > > Why isn't your parent pointer a vmbus device all the
> time?
> > > > > How
> > > > > > > could
> > > > > > > > > > you get burried down in the device hierarchy when you
> are
> > > the
> > > > > > > driver for
> > > > > > > > > > a specific bus type in the first place? How could
> this
> > > > > function
> > > > > > > ever be
> > > > > > > > > > called for a device that is NOT of this type?
> > > > > > > > >
> > > > > > > > > We get notified when state changes on any of the netdev
> > > devices
> > > > > in
> > > > > > > the system.
> > > > > > > > > Not all netdevs in the system belong to vmbus. Consider
> for
> > > > > instance
> > > > > > > the
> > > > > > > > > emulated NIC that can be configured. This is an emulated
> PCI
> > > NIC.
> > > > > We
> > > > > > > are only
> > > > > > > > > interested in netdevs that correspond to the VF instance
> > > that we
> > > > > are
> > > > > > > interested in.
> > > > > > > >
> > > > > > > > Can you "know" this is your netdev by some other way than
> > > having
> > > > > to
> > > > > > > walk
> > > > > > > > the device tree? Name? local device type? Something
> else?
> > > This
> > > > > > > seems
> > > > > > > > like an odd api in that everyone would have to do
> gyrations
> > > like
> > > > > this
> > > > > > > in
> > > > > > > > order to determine if the netdev is "theirs" or not...
> > > > > > >
> > > > > > > The scenario is SR-IOV on Hyper-V. In the case of VF device,
> the
> > > > > host
> > > > > > > hands the
> > > > > > > guest OS a PCI device for the virtual function device. The
> VF
> > > device
> > > > > is
> > > > > > > placed
> > > > > > > on a special synthetic PCI bus (ie not part of the other
> buses
> > > on
> > > > > the
> > > > > > > system).
> > > > > > > The VF device also has a synthetic network interface (netvsc)
> > > which
> > > > > > > lives
> > > > > > > on VMBUS. This code is about managing the interaction
> between
> > > the
> > > > > two.
> > > > > > >
> > > > > > > The association between VF and synthetic NIC is done in
> response
> > > to
> > > > > the
> > > > > > > VF network device being registered. Initial version was
> based on
> > > MAC
> > > > > > > address
> > > > > > > which is the same. Later refinement used permanent MAC
> address
> > > to
> > > > > > > avoid bugs if MAC address changed. This version is to use
> > > serial
> > > > > number
> > > > > > > instead which is safer than MAC address.
> > > > > > >
> > > > > > > The code to walk up/down maybe not be needed to find serial
> > > number.
> > > > > > > Perhaps a more direct single set of conditions is possible?
> > > > > > >
> > > > > > > Something like:
> > > > > > >
> > > > > > > In pci-hyperv.c
> > > > > > >
> > > > > > > u32 hv_pcifront_get_serial(struct pci_bus *bus, unsigned int
> > > devfn)
> > > > > > > {
> > > > > > > struct hv_pcibus_device *hbus
> > > > > > > = container_of(bus->sysdata,
> > > > > > > struct hv_pcibus_device, sysdata);
> > > > > > > struct hf_pci_dev *hpdev;
> > > > > > > u32 serial;
> > > > > > >
> > > > > > > hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev-
> >devfn));
> > > > > > > if (!hpdev)
> > > > > > > return 0;
> > > > > > >
> > > > > > > serial = hpdev->devs.ser;
> > > > > > > put_pcichild(hpdev, hv_pcidev_ref_by_slot);
> > > > > > > return serial;
> > > > > > > }
> > > > > > >
> > > > > > > In netvsc_drv.c
> > > > > > >
> > > > > > > static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> > > > > > > {
> > > > > > > struct device *dev = vf_netdev->dev.parent;
> > > > > > > struct pci_dev *pdev;
> > > > > > > u32 wslot;
> > > > > > >
> > > > > > > if (!dev || !dev_is_pci(dev))
> > > > > > > return 0;
> > > > > > >
> > > > > > > pdev = container_of(dev, struct pci_device, dev);
> > > > > > >
> > > > > > > return hv_pcifront_get_serial(pdev->bus, pdev->devfn);
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > P.S: it would be good to be able to get win_slot out through
> > > sysfs
> > > > > as
> > > > > > > well for systemd/udev.
> > > > > >
> > > > > > Stephen,
> > > > > >
> > > > > > Thanks for suggestion. Actually, in my earlier implementation
> of
> > > this
> > > > > > feature (VF serial based matching), I thought about export a
> > > function
> > > > > > from vPCI driver, then calling it from netvsc. So I don't need
> to
> > > > > > move structs between headers... But, it creates a dependency
> of
> > > netvsc
> > > > > > on vPCI driver's symbol. So, even if on a VM without SRIOV, we
> > > have to
> > > > > > load vPCI driver, which we don't want.
> > > > > >
> > > > > > Also, hv_vpci device is 3 parent layers above the vf_netdevice:
> > > > > > Here is the VF drv hierarchy --
> > > > > > Should we assume it's always 3 parents above vf_netdevice, or
> > > search
> > > > > for it?
> > > > > >
> > > > > > [ 368.185259] HZINFO:NETDEV_REGISTER:
> > > > > > [ 368.185261] HZINFO: dev:ffff88007c10d518, bus:
> (null),
> > > > > busName:(null), drvName:(null)
> > > > > > [ 368.185262] HZINFO: dev:ffff88007c10c0a0,
> bus:ffffffff81ce4b60,
> > > > > busName:pci, drvName:ixgbevf
> > > > > > [ 368.185263] HZINFO: dev:ffff8800355c0000, bus:
> (null),
> > > > > busName:(null), drvName:(null)
> > > > > > [ 368.185264] HZINFO: dev:ffff8800355c5428,
> bus:ffffffffa0008160,
> > > > > busName:vmbus, drvName:hv_pci
> > > > > > [ 368.185264] HZINFO: dev:ffff88007c49e268,
> bus:ffffffff81ce9800,
> > > > > busName:acpi, drvName:vmbus
> > > > > > [ 368.185265] HZINFO: dev:ffff88007c48ea68,
> bus:ffffffff81ce9800,
> > > > > busName:acpi, drvName:(null)
> > > > > > [ 368.185266] HZINFO: dev:ffff88007c48aa68,
> bus:ffffffff81ce9800,
> > > > > busName:acpi, drvName:(null)
> > > > > > [ 368.185266] HZINFO: dev:ffff88007c48a268,
> bus:ffffffff81ce9800,
> > > > > busName:acpi, drvName:(null)
> > > > > > [ 368.185267] HZINFO: dev:ffff88007c489a68,
> bus:ffffffff81ce9800,
> > > > > busName:acpi, drvName:(null)
> > > > > >
> > > > > > Thanks,
> > > > > > - Haiyang
> > > > >
> > > > > Since this is a synthetic bus, the topology should not change
> unless
> > > > > host side
> > > > > software changes. The vf_netdev device has to be PCI device, so
> that
> > > is
> > > > > going to
> > > > > be certain. After that there maybe intermediate up to hv_pci.
> The
> > > code
> > > > > in hyperv-pci
> > > > > already has similar stuff (ie for read_config).
> > > >
> > > > Other netdevice, like emulated NIC can also trigger this
> notification.
> > > > They are not vPCI.
> > > >
> > > > Thanks,
> > > > - Haiyang
> > >
> > > Emulated NIC is already excluded in start of netvc notifier handler.
> > >
> > > static int netvsc_netdev_event(struct notifier_block *this,
> > > unsigned long event, void *ptr)
> > > {
> > > struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
> > >
> > > /* Skip our own events */
> > > if (event_dev->netdev_ops == &device_ops)
> > > return NOTIFY_DONE;
> > >
> >
> > Emulated device is not based on netvsc. It's the native Linux
> (dec100M?)
> > Driver. So this line doesn't exclude it. And how about other NIC type
> > may be added in the future?
>
> Sorry, forgot about that haven't used emulated device in years.
> The emulated device should appear to be on a PCI bus, but the serial
> would not match??
It's not a vmbus device, not a hv_pci device either. Hv_PCI is a subset
of vmbus devices. So emulated NIC won't have hv_pci serial number.
In my patch, the following code ensure, we only try to get serial number
after confirming it's vmbus and hv_pci device:
+ if (!dev_is_vmbus(dev))
+ continue;
+
+ hdev = device_to_hv_device(dev);
+ if (hdev->device_id != HV_PCIE)
+ continue;
Thanks,
- Haiyang