Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers

From: Greg KH
Date: Fri Dec 16 2016 - 11:47:27 EST


On Wed, Dec 14, 2016 at 03:51:34PM -0800, Stephen Hemminger wrote:
> On Wed, 14 Dec 2016 15:27:58 -0800
> Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Wed, Dec 14, 2016 at 11:18:59PM +0000, Haiyang Zhang wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > > > Sent: Saturday, December 10, 2016 7:21 AM
> > > > To: Stephen Hemminger <stephen@xxxxxxxxxxxxxxxxxx>
> > > > Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; olaf@xxxxxxxxx;
> > > > jasowang@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > > bjorn.helgaas@xxxxxxxxx; apw@xxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > > leann.ogasawara@xxxxxxxxxxxxx
> > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> > > > serial numbers
> > > >
> > > > On Fri, Dec 09, 2016 at 04:21:48PM -0800, Stephen Hemminger wrote:
> > > > > On Fri, 9 Dec 2016 22:35:05 +0000
> > > > > Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > > > >
> > > > > > > > > Emulated NIC is already excluded in start of netvc notifier
> > > > handler.
> > > > > > > > >
> > > > > > > > > static int netvsc_netdev_event(struct notifier_block *this,
> > > > > > > > > unsigned long event, void *ptr)
> > > > > > > > > {
> > > > > > > > > struct net_device *event_dev =
> > > > netdev_notifier_info_to_dev(ptr);
> > > > > > > > >
> > > > > > > > > /* Skip our own events */
> > > > > > > > > if (event_dev->netdev_ops == &device_ops)
> > > > > > > > > return NOTIFY_DONE;
> > > > > > > > >
> > > > > > > >
> > > > > > > > Emulated device is not based on netvsc. It's the native Linux
> > > > > > > (dec100M?)
> > > > > > > > Driver. So this line doesn't exclude it. And how about other NIC
> > > > type
> > > > > > > > may be added in the future?
> > > > > > >
> > > > > > > Sorry, forgot about that haven't used emulated device in years.
> > > > > > > The emulated device should appear to be on a PCI bus, but the
> > > > serial
> > > > > > > would not match??
> > > > > >
> > > > > > It's not a vmbus device, not a hv_pci device either. Hv_PCI is a
> > > > subset
> > > > > > of vmbus devices. So emulated NIC won't have hv_pci serial number.
> > > > > >
> > > > > > In my patch, the following code ensure, we only try to get serial
> > > > number
> > > > > > after confirming it's vmbus and hv_pci device:
> > > > > >
> > > > > > + if (!dev_is_vmbus(dev))
> > > > > > + continue;
> > > > > > +
> > > > > > + hdev = device_to_hv_device(dev);
> > > > > > + if (hdev->device_id != HV_PCIE)
> > > > > > + continue;
> > > > >
> > > > > Ok, the walk back up the device tree is logically ok, but I don't
> > > > > know enough about PCI device tree to be assured that it is safe.
> > > > > Also, you could short circuit away most of the unwanted devices
> > > > > by making sure the vf_netdev->dev.parent is a PCI device.
> > > >
> > > > Ugh, this seems really really messy. Can't we just have the
> > > > netdev_event interface pass back a pointer to something that we "know"
> > > > what it is? This walking the device tree is a mess, and not good.
> > > >
> > > > I'd even argue that dev_is_pci() needs to be removed from the tree too,
> > > > as it shouldn't be needed either. We did a lot of work on the driver
> > > > model to prevent the need for having to declare the "type" of 'struct
> > > > device' at all, and by doing this type of thing it goes against the
> > > > basic design of the model.
> > > >
> > > > Yes, it makes things a bit "tougher" in places, but you don't do crazy
> > > > things like walk device trees to try to find random devices and then
> > > > think it's safe to actually use them :(
> > > >
> > >
> > > We register a notifier_block with:
> > > register_netdevice_notifier(struct notifier_block *nb)
> > >
> > > The "struct notifier_block" basically contains a callback function:
> > > struct notifier_block {
> > > notifier_fn_t notifier_call;
> > > struct notifier_block __rcu *next;
> > > int priority;
> > > };
> > >
> > > It doesn't specify which device we want, so all net devices can trigger
> > > this event. Seems we can't have this notifier return VF device only.
> >
> > Ok, I dug in the kernel and it looks like people check the netdev_ops
> > structure to see if it matches up with their function pointers to "know"
> > if this is their device or not. Why not do that here? Or compare the
> > "string" of the driver name? Or any other such trick that the drivers
> > that call register_netdevice_notifier do?
> >
> > All of which are more sane than walking the device tree...
> >
> > And why am I having to do network driver development, ick ick ick :)
> >
> > Come on, 'git grep' is your friend. Even better yet, use a good tool
> > like 'vgrep' which makes git grep work really really well.
>
> Normally, that would work but in this case we have one driver (netvsc)
> which is managing another driver which is unaware of Hyper-V or netvsc
> drivers existence.

That's the root problem here :)

> The callback is happening in netvsc driver and it
> needs to say "hey I know that SR-IOV device, it is associated with my
> network device". This problem is how to know that N is associated with
> V? The V device has to be a network device, that is easy. But then it
> also has to be a PCI device, not to bad.

I'd argue that it is just as bad, as it shouldn't be poking around with
a random 'struct device' like that, but I'll let it slide...

> But then the netvsc code
> is matching based on hyper-V only PCI bus metadata (the serial #).

Which is a mess.

Again, walking the device tree like this is racy, broken, and shouldn't
be done anywhere. You are crossing bus boundries and lots of bad things
could happen if a device was removed at the same time. Just don't do
that.

> The Microsoft developers made the rational decision not to go modifying
> all the possible SR-IOV network devices from Intel and Mellanox to add
> the functionality there. That would have been much worse.

Why is that worse? How many lines of code would that be?

> Maybe, rather than trying to do the management in the kernel it
> could have been done better in user space. Unfortunately, this would
> only move the problem. The PCI-hyperv host driver could expose serial
> value through sysfs (with some pain). But the problem would be how
> to make a new API to join the two V and N device. Doing a private
> ioctl is worse than the notifier.

I still don't really understand the relationship between V and N, but it
feels like it is very tenous and sketchy and you should work to make it
more explicit. We have the source to these drivers, do it correctly,
or, do something in the network bus layer to be able to properly
represent this heirachy so that it is "obvious" what is going on.

thanks,

greg k-h