Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers

From: Stephen Hemminger
Date: Wed Dec 14 2016 - 18:51:50 EST


On Wed, 14 Dec 2016 15:27:58 -0800
Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Dec 14, 2016 at 11:18:59PM +0000, Haiyang Zhang wrote:
> >
> >
> > > -----Original Message-----
> > > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > > Sent: Saturday, December 10, 2016 7:21 AM
> > > To: Stephen Hemminger <stephen@xxxxxxxxxxxxxxxxxx>
> > > Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; olaf@xxxxxxxxx;
> > > jasowang@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > bjorn.helgaas@xxxxxxxxx; apw@xxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > leann.ogasawara@xxxxxxxxxxxxx
> > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> > > serial numbers
> > >
> > > On Fri, Dec 09, 2016 at 04:21:48PM -0800, Stephen Hemminger wrote:
> > > > On Fri, 9 Dec 2016 22:35:05 +0000
> > > > Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > > > > >
> > > > > > > > Emulated NIC is already excluded in start of netvc notifier
> > > handler.
> > > > > > > >
> > > > > > > > static int netvsc_netdev_event(struct notifier_block *this,
> > > > > > > > unsigned long event, void *ptr)
> > > > > > > > {
> > > > > > > > struct net_device *event_dev =
> > > netdev_notifier_info_to_dev(ptr);
> > > > > > > >
> > > > > > > > /* Skip our own events */
> > > > > > > > if (event_dev->netdev_ops == &device_ops)
> > > > > > > > return NOTIFY_DONE;
> > > > > > > >
> > > > > > >
> > > > > > > Emulated device is not based on netvsc. It's the native Linux
> > > > > > (dec100M?)
> > > > > > > Driver. So this line doesn't exclude it. And how about other NIC
> > > type
> > > > > > > may be added in the future?
> > > > > >
> > > > > > Sorry, forgot about that haven't used emulated device in years.
> > > > > > The emulated device should appear to be on a PCI bus, but the
> > > serial
> > > > > > would not match??
> > > > >
> > > > > It's not a vmbus device, not a hv_pci device either. Hv_PCI is a
> > > subset
> > > > > of vmbus devices. So emulated NIC won't have hv_pci serial number.
> > > > >
> > > > > In my patch, the following code ensure, we only try to get serial
> > > number
> > > > > after confirming it's vmbus and hv_pci device:
> > > > >
> > > > > + if (!dev_is_vmbus(dev))
> > > > > + continue;
> > > > > +
> > > > > + hdev = device_to_hv_device(dev);
> > > > > + if (hdev->device_id != HV_PCIE)
> > > > > + continue;
> > > >
> > > > Ok, the walk back up the device tree is logically ok, but I don't
> > > > know enough about PCI device tree to be assured that it is safe.
> > > > Also, you could short circuit away most of the unwanted devices
> > > > by making sure the vf_netdev->dev.parent is a PCI device.
> > >
> > > Ugh, this seems really really messy. Can't we just have the
> > > netdev_event interface pass back a pointer to something that we "know"
> > > what it is? This walking the device tree is a mess, and not good.
> > >
> > > I'd even argue that dev_is_pci() needs to be removed from the tree too,
> > > as it shouldn't be needed either. We did a lot of work on the driver
> > > model to prevent the need for having to declare the "type" of 'struct
> > > device' at all, and by doing this type of thing it goes against the
> > > basic design of the model.
> > >
> > > Yes, it makes things a bit "tougher" in places, but you don't do crazy
> > > things like walk device trees to try to find random devices and then
> > > think it's safe to actually use them :(
> > >
> >
> > We register a notifier_block with:
> > register_netdevice_notifier(struct notifier_block *nb)
> >
> > The "struct notifier_block" basically contains a callback function:
> > struct notifier_block {
> > notifier_fn_t notifier_call;
> > struct notifier_block __rcu *next;
> > int priority;
> > };
> >
> > It doesn't specify which device we want, so all net devices can trigger
> > this event. Seems we can't have this notifier return VF device only.
>
> Ok, I dug in the kernel and it looks like people check the netdev_ops
> structure to see if it matches up with their function pointers to "know"
> if this is their device or not. Why not do that here? Or compare the
> "string" of the driver name? Or any other such trick that the drivers
> that call register_netdevice_notifier do?
>
> All of which are more sane than walking the device tree...
>
> And why am I having to do network driver development, ick ick ick :)
>
> Come on, 'git grep' is your friend. Even better yet, use a good tool
> like 'vgrep' which makes git grep work really really well.

Normally, that would work but in this case we have one driver (netvsc)
which is managing another driver which is unaware of Hyper-V or netvsc
drivers existence. The callback is happening in netvsc driver and it
needs to say "hey I know that SR-IOV device, it is associated with my
network device". This problem is how to know that N is associated with
V? The V device has to be a network device, that is easy. But then it
also has to be a PCI device, not to bad. But then the netvsc code
is matching based on hyper-V only PCI bus metadata (the serial #).

The Microsoft developers made the rational decision not to go modifying
all the possible SR-IOV network devices from Intel and Mellanox to add
the functionality there. That would have been much worse.

Maybe, rather than trying to do the management in the kernel it
could have been done better in user space. Unfortunately, this would
only move the problem. The PCI-hyperv host driver could expose serial
value through sysfs (with some pain). But the problem would be how
to make a new API to join the two V and N device. Doing a private
ioctl is worse than the notifier.