[RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci

From: Yan Zhao
Date: Tue Feb 11 2020 - 05:19:08 EST


When using module vfio-pci to pass through devices, though for the most
of time, it is desired to use default implementations in vfio-pci, vendors
sometimes may want to do certain kind of customization.
For example, vendors may want to add a vendor device region or may want to
intercept writes to a BAR region.
So, in this patch set, we introduce a way to allow vendors to focus on
handling of regions of their interest and call default vfio-pci ops to
handle the reset ones.

It goes like this:
(1) macros are provided to let vendor drivers register/unregister
vfio_pci_vendor_driver_ops to vfio_pci in their module_init() and
module_exit().
vfio_pci_vendor_driver_ops contains callbacks probe() and remove() and a
pointer to vfio_device_ops.

(2) vendor drivers define their module aliases as
"vfio-pci:$vendor_id-$device_id".
E.g. A vendor module for VF devices of Intel(R) Ethernet Controller XL710
family can define its module alias as MODULE_ALIAS("vfio-pci:8086-154c").

(3) when module vfio_pci is bound to a device, it would call modprobe in
user space for modules of alias "vfio-pci:$vendor_id-$device_id", which
would trigger unloaded vendor drivers to register their
vfio_pci_vendor_driver_ops to vfio_pci.
Then it searches registered ops list and calls probe() to test whether this
vendor driver supports this physical device.
A success probe() would make vfio_pci to use vfio_device_ops provided
vendor driver as the ops of the vfio device. So vfio_pci_ops are not to be
called for this device any more. Instead, they are exported to be called
from vendor drivers as a default implementation.


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

__________ (un)register vendor ops | ___________ ___________ |
| |<----------------------------| VF | | |
| vfio-pci | | | vendor | | PF driver | |
|__________|---------------------------->| driver | |___________|
| probe/remove() | ----------- | |
| |
| |_ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _|
\|/ \|/
----------- ------------
| VF | | PF |
----------- ------------
a typical usage in SRIOV



Ref counts:
(1) vendor drivers must be a module and compiled to depend on module
vfio_pci.
(2) In vfio_pci, a successful register would add refs of itself, and a
successful unregister would derefs of itself.
(3) In vfio_pci, a successful probe() of a vendor driver would add ref of
the vendor module. It derefs of the vendor module after calling remove().
(4) macro provided to make sure vendor module always unregister itself in
its module_exit

Those are to prevent below conditions:
a. vfio_pci is unloaded after a successful register from vendor driver.
Though vfio_pci would later call modprobe to ask the vendor module to
register again, it cannot help if vendor driver remain as loaded
across unloading-loading of vfio_pci.
b. vendor driver unregisters itself after successfully probed by vfio_pci.
c. circular dependency between vfio_pci and the vendor driver.
if vfio_pci adds refs to both vfio_pci and vendor driver on a successful
register and if vendor driver only do the unregister in its module_exit,
then it would have no chance to do the unregister.


Patch Overview
patches 1-2 making struct vfio_pci_device public and functions
in struct vfio_pci_ops exported
patches 3-4 provide register/unregister interfaces for vendor drivers
patches 5-6 some more enhancements
patch 7 provides an sample to pass through IGD devices
patches 8-9 implement VF live migration on Intel's 710 SRIOV devices.
Some dirty page tracking functions are intentionally
commented out and would send out later in future.

Changelog:
RFC v2- RFC v3:
- embedding struct vfio_pci_device into struct vfio_pci_device_private.
(Alex)

RFC v1- RFC v2:
- renamed mediate ops to vendor ops
- use of request_module and module alias to manage vendor driver load
(Alex)
- changed from vfio_pci_ops calling vendor ops
to vendor ops calling default vfio_pci_ops (Alex)
- dropped patches for dynamic traps of BARs. will submit them later.

Links:
Previous versions:
RFC v2:
https://lkml.org/lkml/2020/1/30/956

RFC v1:
kernel part: https://www.spinics.net/lists/kernel/msg3337337.html.
qemu part: https://www.spinics.net/lists/kernel/msg3337337.html.

VFIO live migration v8:
https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05542.html.


Yan Zhao (9):
vfio/pci: export vfio_pci_device public and add
vfio_pci_device_private
vfio/pci: export functions in vfio_pci_ops
vfio/pci: register/unregister vfio_pci_vendor_driver_ops
vfio/pci: macros to generate module_init and module_exit for vendor
modules
vfio/pci: let vfio_pci know how many vendor regions are registered
vfio/pci: export vfio_pci_setup_barmap
samples/vfio-pci: add a sample vendor module of vfio-pci for IGD
devices
vfio: header for vfio live migration region.
i40e/vf_migration: vfio-pci vendor driver for VF live migration

drivers/net/ethernet/intel/Kconfig | 10 +
drivers/net/ethernet/intel/i40e/Makefile | 2 +
drivers/net/ethernet/intel/i40e/i40e.h | 2 +
.../ethernet/intel/i40e/i40e_vf_migration.c | 635 ++++++++++++++++++
.../ethernet/intel/i40e/i40e_vf_migration.h | 92 +++
drivers/vfio/pci/vfio_pci.c | 385 +++++++----
drivers/vfio/pci/vfio_pci_config.c | 186 ++---
drivers/vfio/pci/vfio_pci_igd.c | 19 +-
drivers/vfio/pci/vfio_pci_intrs.c | 186 ++---
drivers/vfio/pci/vfio_pci_nvlink2.c | 22 +-
drivers/vfio/pci/vfio_pci_private.h | 13 +-
drivers/vfio/pci/vfio_pci_rdwr.c | 64 +-
include/linux/vfio.h | 57 ++
include/uapi/linux/vfio.h | 149 ++++
samples/Kconfig | 6 +
samples/Makefile | 1 +
samples/vfio-pci/Makefile | 2 +
samples/vfio-pci/igd_pt.c | 148 ++++
18 files changed, 1645 insertions(+), 334 deletions(-)
create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.c
create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.h
create mode 100644 samples/vfio-pci/Makefile
create mode 100644 samples/vfio-pci/igd_pt.c

--
2.17.1