Re: [PATCH v10 01/19] vfio: Mediated device Core driver

From: Jike Song
Date: Mon Oct 31 2016 - 23:12:34 EST


On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by different drivers of different
> devices.
>
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
>
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
>
> +---------------+
> | |
> | +-----------+ | mdev_register_driver() +--------------+
> | | | +<------------------------+ __init() |
> | | mdev | | | |
> | | bus | +------------------------>+ |<-> VFIO user
> | | driver | | probe()/remove() | vfio_mdev.ko | APIs
> | | | | | |
> | +-----------+ | +--------------+
> | |
> | MDEV CORE |
> | MODULE |
> | mdev.ko |
> | +-----------+ | mdev_register_device() +--------------+
> | | | +<------------------------+ |
> | | | | | nvidia.ko |<-> physical
> | | | +------------------------>+ | device
> | | | | callback +--------------+
> | | Physical | |
> | | device | | mdev_register_device() +--------------+
> | | interface | |<------------------------+ |
> | | | | | i915.ko |<-> physical
> | | | +------------------------>+ | device
> | | | | callback +--------------+
> | | | |
> | | | | mdev_register_device() +--------------+
> | | | +<------------------------+ |
> | | | | | ccw_device.ko|<-> physical
> | | | +------------------------>+ | device
> | | | | callback +--------------+
> | +-----------+ |
> +---------------+
>
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
>
> /**
> * struct mdev_driver - Mediated device's driver
> * @name: driver name
> * @probe: called when new device created
> * @remove:called when device removed
> * @driver:device driver structure
> *
> **/
> struct mdev_driver {
> const char *name;
> int (*probe) (struct device *dev);
> void (*remove) (struct device *dev);
> struct device_driver driver;
> };
>
> Mediated bus driver for mdev device should use this interface to register
> and unregister with core driver respectively:
>
> int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
>
> Mediated bus driver is responsible to add/delete mediated devices to/from
> VFIO group when devices are bound and unbound to the driver.
>
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in its driver. APIs are :
>
> * dev_attr_groups: attributes of the parent device.
> * mdev_attr_groups: attributes of the mediated device.
> * supported_type_groups: attributes to define supported type. This is
> mandatory field.
> * create: to allocate basic resources in vendor driver for a mediated
> device. This is mandatory to be provided by vendor driver.
> * remove: to free resources in vendor driver when mediated device is
> destroyed. This is mandatory to be provided by vendor driver.
> * open: open callback of mediated device
> * release: release callback of mediated device
> * read : read emulation callback.
> * write: write emulation callback.
> * mmap: mmap emulation callback.
> * ioctl: ioctl callback.
>
> Drivers should use these interfaces to register and unregister device to
> mdev core driver respectively:
>
> extern int mdev_register_device(struct device *dev,
> const struct parent_ops *ops);
> extern void mdev_unregister_device(struct device *dev);
>
> There are no locks to serialize above callbacks in mdev driver and
> vfio_mdev driver. If required, vendor driver can have locks to serialize
> above APIs in their driver.

Maybe some information above could be placed under Documentation instead?
It's kind of weird to have the block diagram and interfaces documentation
in the git commit message :-)

>
> Signed-off-by: Kirti Wankhede <kwankhede@xxxxxxxxxx>
> Signed-off-by: Neo Jia <cjia@xxxxxxxxxx>
> Change-Id: I73a5084574270b14541c529461ea2f03c292d510
> ---
> drivers/vfio/Kconfig | 1 +
> drivers/vfio/Makefile | 1 +
> drivers/vfio/mdev/Kconfig | 10 +
> drivers/vfio/mdev/Makefile | 4 +
> drivers/vfio/mdev/mdev_core.c | 384 +++++++++++++++++++++++++++++++++++++++
> drivers/vfio/mdev/mdev_driver.c | 122 +++++++++++++
> drivers/vfio/mdev/mdev_private.h | 41 +++++
> drivers/vfio/mdev/mdev_sysfs.c | 286 +++++++++++++++++++++++++++++
> include/linux/mdev.h | 167 +++++++++++++++++
> 9 files changed, 1016 insertions(+)
> create mode 100644 drivers/vfio/mdev/Kconfig
> create mode 100644 drivers/vfio/mdev/Makefile
> create mode 100644 drivers/vfio/mdev/mdev_core.c
> create mode 100644 drivers/vfio/mdev/mdev_driver.c
> create mode 100644 drivers/vfio/mdev/mdev_private.h
> create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
> create mode 100644 include/linux/mdev.h
>
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index da6e2ce77495..23eced02aaf6 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -48,4 +48,5 @@ menuconfig VFIO_NOIOMMU
>
> source "drivers/vfio/pci/Kconfig"
> source "drivers/vfio/platform/Kconfig"
> +source "drivers/vfio/mdev/Kconfig"
> source "virt/lib/Kconfig"
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 7b8a31f63fea..4a23c13b6be4 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
> obj-$(CONFIG_VFIO_PCI) += pci/
> obj-$(CONFIG_VFIO_PLATFORM) += platform/
> +obj-$(CONFIG_VFIO_MDEV) += mdev/
> diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
> new file mode 100644
> index 000000000000..303c14ce2847
> --- /dev/null
> +++ b/drivers/vfio/mdev/Kconfig
> @@ -0,0 +1,10 @@
> +
> +config VFIO_MDEV
> + tristate "Mediated device driver framework"
> + depends on VFIO
> + default n
> + help
> + Provides a framework to virtualize devices.
> + See Documentation/vfio-mdev/vfio-mediated-device.txt for more details.
> +
> + If you don't know what do here, say N.
> diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
> new file mode 100644
> index 000000000000..31bc04801d94
> --- /dev/null
> +++ b/drivers/vfio/mdev/Makefile
> @@ -0,0 +1,4 @@
> +
> +mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
> +
> +obj-$(CONFIG_VFIO_MDEV) += mdev.o
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> new file mode 100644
> index 000000000000..9d8fa5c91c2e
> --- /dev/null
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -0,0 +1,384 @@
> +/*
> + * Mediated device Core Driver
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + * Author: Neo Jia <cjia@xxxxxxxxxx>
> + * Kirti Wankhede <kwankhede@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/uuid.h>
> +#include <linux/sysfs.h>
> +#include <linux/mdev.h>
> +
> +#include "mdev_private.h"
> +
> +#define DRIVER_VERSION "0.1"
> +#define DRIVER_AUTHOR "NVIDIA Corporation"
> +#define DRIVER_DESC "Mediated device Core Driver"
> +
> +static LIST_HEAD(parent_list);
> +static DEFINE_MUTEX(parent_list_lock);
> +static struct class_compat *mdev_bus_compat_class;
> +
> +static int _find_mdev_device(struct device *dev, void *data)
> +{
> + struct mdev_device *mdev;
> +
> + if (!dev_is_mdev(dev))
> + return 0;
> +
> + mdev = to_mdev_device(dev);
> +
> + if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
> + return 1;
> +
> + return 0;
> +}

> +
> +static bool mdev_device_exist(struct parent_device *parent, uuid_le uuid)
> +{
> + struct device *dev;
> +
> + dev = device_find_child(parent->dev, &uuid, _find_mdev_device);

'_find_mdev_device' is actually a match, better rename to '__match_mdev' or
something alike.

> + if (dev) {
> + put_device(dev);
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/* Should be called holding parent_list_lock */
> +static struct parent_device *__find_parent_device(struct device *dev)
> +{
> + struct parent_device *parent;
> +
> + list_for_each_entry(parent, &parent_list, next) {
> + if (parent->dev == dev)
> + return parent;
> + }
> + return NULL;
> +}
> +
> +static void mdev_release_parent(struct kref *kref)
> +{
> + struct parent_device *parent = container_of(kref, struct parent_device,
> + ref);
> + struct device *dev = parent->dev;
> +
> + kfree(parent);
> + put_device(dev);
> +}
> +
> +static
> +inline struct parent_device *mdev_get_parent(struct parent_device *parent)
> +{
> + if (parent)
> + kref_get(&parent->ref);
> +
> + return parent;
> +}
> +
> +static inline void mdev_put_parent(struct parent_device *parent)
> +{
> + if (parent)
> + kref_put(&parent->ref, mdev_release_parent);
> +}
> +
> +static int mdev_device_create_ops(struct kobject *kobj,
> + struct mdev_device *mdev)
> +{
> + struct parent_device *parent = mdev->parent;
> + int ret;
> +
> + ret = parent->ops->create(kobj, mdev);
> + if (ret)
> + return ret;
> +
> + ret = sysfs_create_groups(&mdev->dev.kobj,
> + parent->ops->mdev_attr_groups);
> + if (ret)
> + parent->ops->remove(mdev);
> +
> + return ret;
> +}
> +
> +/*
> + * mdev_device_remove_ops gets called from sysfs's 'remove' and when parent
> + * device is being unregistered from mdev device framework.
> + * - 'force_remove' is set to 'false' when called from sysfs's 'remove' which
> + * indicates that if the mdev device is active, used by VMM or userspace
> + * application, vendor driver could return error then don't remove the device.
> + * - 'force_remove' is set to 'true' when called from mdev_unregister_device()
> + * which indicate that parent device is being removed from mdev device
> + * framework so remove mdev device forcefully.
> + */
> +static int mdev_device_remove_ops(struct mdev_device *mdev, bool force_remove)
> +{
> + struct parent_device *parent = mdev->parent;
> + int ret;
> +
> + /*
> + * Vendor driver can return error if VMM or userspace application is
> + * using this mdev device.
> + */
> + ret = parent->ops->remove(mdev);
> + if (ret && !force_remove)
> + return -EBUSY;
> +
> + sysfs_remove_groups(&mdev->dev.kobj, parent->ops->mdev_attr_groups);
> + return 0;
> +}
> +
> +static int mdev_device_remove_cb(struct device *dev, void *data)
> +{
> + if (!dev_is_mdev(dev))
> + return 0;
> +
> + return mdev_device_remove(dev, data ? *(bool *)data : true);
> +}
> +
> +/*
> + * mdev_register_device : Register a device
> + * @dev: device structure representing parent device.
> + * @ops: Parent device operation structure to be registered.
> + *
> + * Add device to list of registered parent devices.
> + * Returns a negative value on error, otherwise 0.
> + */
> +int mdev_register_device(struct device *dev, const struct parent_ops *ops)
> +{
> + int ret;
> + struct parent_device *parent;
> +
> + /* check for mandatory ops */
> + if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
> + return -EINVAL;
> +
> + dev = get_device(dev);
> + if (!dev)
> + return -EINVAL;
> +
> + mutex_lock(&parent_list_lock);
> +
> + /* Check for duplicate */
> + parent = __find_parent_device(dev);
> + if (parent) {
> + ret = -EEXIST;
> + goto add_dev_err;
> + }
> +
> + parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> + if (!parent) {
> + ret = -ENOMEM;
> + goto add_dev_err;
> + }
> +
> + kref_init(&parent->ref);
> + mutex_init(&parent->lock);
> +
> + parent->dev = dev;
> + parent->ops = ops;
> +
> + ret = parent_create_sysfs_files(parent);
> + if (ret) {
> + mutex_unlock(&parent_list_lock);
> + mdev_put_parent(parent);
> + return ret;
> + }
> +
> + ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
> + if (ret)
> + dev_warn(dev, "Failed to create compatibility class link\n");
> +
> + list_add(&parent->next, &parent_list);
> + mutex_unlock(&parent_list_lock);
> +
> + dev_info(dev, "MDEV: Registered\n");
> + return 0;
> +
> +add_dev_err:
> + mutex_unlock(&parent_list_lock);
> + put_device(dev);
> + return ret;
> +}
> +EXPORT_SYMBOL(mdev_register_device);
> +
> +/*
> + * mdev_unregister_device : Unregister a parent device
> + * @dev: device structure representing parent device.
> + *
> + * Remove device from list of registered parent devices. Give a chance to free
> + * existing mediated devices for given device.
> + */
> +
> +void mdev_unregister_device(struct device *dev)
> +{
> + struct parent_device *parent;
> + bool force_remove = true;
> +
> + mutex_lock(&parent_list_lock);
> + parent = __find_parent_device(dev);
> +
> + if (!parent) {
> + mutex_unlock(&parent_list_lock);
> + return;
> + }
> + dev_info(dev, "MDEV: Unregistering\n");
> +
> + list_del(&parent->next);
> + class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> +
> + device_for_each_child(dev, (void *)&force_remove,
> + mdev_device_remove_cb);
> +
> + parent_remove_sysfs_files(parent);
> +
> + mutex_unlock(&parent_list_lock);
> + mdev_put_parent(parent);
> +}
> +EXPORT_SYMBOL(mdev_unregister_device);
> +
> +static void mdev_device_release(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + dev_dbg(&mdev->dev, "MDEV: destroying\n");
> + kfree(mdev);
> +}
> +
> +int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid)
> +{
> + int ret;
> + struct mdev_device *mdev;
> + struct parent_device *parent;
> + struct mdev_type *type = to_mdev_type(kobj);
> +
> + parent = mdev_get_parent(type->parent);
> + if (!parent)
> + return -EINVAL;
> +
> + mutex_lock(&parent->lock);
> +
> + /* Check for duplicate */
> + if (mdev_device_exist(parent, uuid)) {
> + ret = -EEXIST;
> + goto create_err;
> + }
> +
> + mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> + if (!mdev) {
> + ret = -ENOMEM;
> + goto create_err;
> + }
> +
> + memcpy(&mdev->uuid, &uuid, sizeof(uuid_le));
> + mdev->parent = parent;
> + kref_init(&mdev->ref);
> +
> + mdev->dev.parent = dev;
> + mdev->dev.bus = &mdev_bus_type;
> + mdev->dev.release = mdev_device_release;
> + dev_set_name(&mdev->dev, "%pUl", uuid.b);
> +
> + ret = device_register(&mdev->dev);
> + if (ret) {
> + put_device(&mdev->dev);
> + goto create_err;
> + }
> +
> + ret = mdev_device_create_ops(kobj, mdev);
> + if (ret)
> + goto create_failed;
> +
> + ret = mdev_create_sysfs_files(&mdev->dev, type);
> + if (ret) {
> + mdev_device_remove_ops(mdev, true);
> + goto create_failed;
> + }
> +
> + mdev->type_kobj = kobj;
> + dev_dbg(&mdev->dev, "MDEV: created\n");
> +
> + mutex_unlock(&parent->lock);
> + return ret;
> +
> +create_failed:
> + device_unregister(&mdev->dev);
> +
> +create_err:
> + mutex_unlock(&parent->lock);
> + mdev_put_parent(parent);
> + return ret;
> +}
> +
> +int mdev_device_remove(struct device *dev, bool force_remove)
> +{
> + struct mdev_device *mdev;
> + struct parent_device *parent;
> + struct mdev_type *type;
> + int ret;
> +
> + mdev = to_mdev_device(dev);
> + type = to_mdev_type(mdev->type_kobj);
> + parent = mdev->parent;
> + mutex_lock(&parent->lock);
> +
> + ret = mdev_device_remove_ops(mdev, force_remove);
> + if (ret) {
> + mutex_unlock(&parent->lock);
> + return ret;
> + }
> +
> + mdev_remove_sysfs_files(dev, type);
> + device_unregister(dev);
> + mutex_unlock(&parent->lock);
> + mdev_put_parent(parent);
> + return ret;
> +}
> +
> +static int __init mdev_init(void)
> +{
> + int ret;
> +
> + ret = mdev_bus_register();
> + if (ret) {
> + pr_err("Failed to register mdev bus\n");
> + return ret;
> + }
> +
> + mdev_bus_compat_class = class_compat_register("mdev_bus");
> + if (!mdev_bus_compat_class) {
> + mdev_bus_unregister();
> + return -ENOMEM;
> + }
> +
> + /*
> + * Attempt to load known vfio_mdev. This gives us a working environment
> + * without the user needing to explicitly load vfio_mdev driver.
> + */
> + request_module_nowait("vfio_mdev");
> +
> + return ret;
> +}
> +
> +static void __exit mdev_exit(void)
> +{
> + class_compat_unregister(mdev_bus_compat_class);
> + mdev_bus_unregister();
> +}
> +
> +module_init(mdev_init)
> +module_exit(mdev_exit)
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
> new file mode 100644
> index 000000000000..0b3250044a80
> --- /dev/null
> +++ b/drivers/vfio/mdev/mdev_driver.c
> @@ -0,0 +1,122 @@
> +/*
> + * MDEV driver
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + * Author: Neo Jia <cjia@xxxxxxxxxx>
> + * Kirti Wankhede <kwankhede@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/iommu.h>
> +#include <linux/mdev.h>
> +
> +#include "mdev_private.h"
> +
> +static int mdev_attach_iommu(struct mdev_device *mdev)
> +{
> + int ret;
> + struct iommu_group *group;
> +
> + group = iommu_group_alloc();
> + if (IS_ERR(group))
> + return PTR_ERR(group);

Maybe I overthought, but where the iommu_group is released? On successful
return you already have a ref, I didn't find an iommu_group_put after that.

--
Thanks,
Jike

> +
> + ret = iommu_group_add_device(group, &mdev->dev);
> + if (ret)
> + goto attach_fail;
> +
> + dev_info(&mdev->dev, "MDEV: group_id = %d\n",
> + iommu_group_id(group));
> +attach_fail:
> + iommu_group_put(group);
> + return ret;
> +}
> +
> +static void mdev_detach_iommu(struct mdev_device *mdev)
> +{
> + iommu_group_remove_device(&mdev->dev);
> + dev_info(&mdev->dev, "MDEV: detaching iommu\n");
> +}
> +