Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

From: Jason Gunthorpe
Date: Mon Mar 22 2021 - 12:44:56 EST


On Mon, Mar 22, 2021 at 04:11:25PM +0100, Christoph Hellwig wrote:
> On Fri, Mar 19, 2021 at 05:07:49PM -0300, Jason Gunthorpe wrote:
> > The way the driver core works is to first match against the already
> > loaded driver list, then trigger an event for module loading and when
> > new drivers are registered they bind to unbound devices.
> >
> > So, the trouble is the event through userspace because the kernel
> > can't just go on to use vfio_pci until it knows userspace has failed
> > to satisfy the load request.
> >
> > One answer is to have userspace udev have the "hook" here and when a
> > vfio flavour mod alias is requested on a PCI device it swaps in
> > vfio_pci if it can't find an alternative.
> >
> > The dream would be a system with no vfio modules loaded could do some
> >
> > echo "vfio" > /sys/bus/pci/xxx/driver_flavour
> >
> > And a module would be loaded and a struct vfio_device is created for
> > that device. Very easy for the user.
>
> Maybe I did not communicate my suggestion last week very well. My
> idea is that there are no different pci_drivers vs vfio or not,
> but different personalities of the same driver.

This isn't quite the scenario that needs solving. Lets go back to
Max's V1 posting:

The mlx5_vfio_pci.c pci_driver matches this:

+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1042,
+ PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID) }, /* Virtio SNAP controllers */

This overlaps with the match table in
drivers/virtio/virtio_pci_common.c:

{ PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },

So, if we do as you propose we have to add something mellanox specific
to virtio_pci_common which seems to me to just repeating this whole
problem except in more drivers.

The general thing that that is happening is people are adding VM
migration capability to existing standard PCI interfaces like VFIO,
NVMe, etc

At least in this mlx5 situation the PF driver provides the HW access
to do the migration and the vfio mlx5 driver provides all the protocol
and machinery specific to the PCI standard being migrated. They are
all a little different.

But you could imagine some other implemetnation where the VF might
have an extra non-standard BAR that is the migration control.

This is why I like having a full stand alone pci_driver as everyone
implementing this can provide the vfio_device that is appropriate for
the HW.

Jason