RE: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver

From: Liu, Yi L
Date: Thu Jun 20 2019 - 09:05:54 EST


Hi Alex,

> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> Sent: Thursday, June 20, 2019 12:27 PM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver
>
> On Sat, 8 Jun 2019 21:21:11 +0800
> Liu Yi L <yi.l.liu@xxxxxxxxx> wrote:
>
> > This patch adds sample driver named vfio-mdev-pci. It is to wrap
> > a PCI device as a mediated device. For a pci device, once bound
> > to vfio-mdev-pci driver, user space access of this device will
> > go through vfio mdev framework. The usage of the device follows
> > mdev management method. e.g. user should create a mdev before
> > exposing the device to user-space.
> >
> > Benefit of this new driver would be acting as a sample driver
> > for recent changes from "vfio/mdev: IOMMU aware mediated device"
> > patchset. Also it could be a good experiment driver for future
> > device specific mdev migration support.
> >
> > To use this driver:
> > a) build and load vfio-mdev-pci.ko module
> > execute "make menuconfig" and config CONFIG_SAMPLE_VFIO_MDEV_PCI
> > then load it with following command
> > > sudo modprobe vfio
> > > sudo modprobe vfio-pci
> > > sudo insmod drivers/vfio/pci/vfio-mdev-pci.ko
> >
> > b) unbind original device driver
> > e.g. use following command to unbind its original driver
> > > echo $dev_bdf > /sys/bus/pci/devices/$dev_bdf/driver/unbind
> >
> > c) bind vfio-mdev-pci driver to the physical device
> > > echo $vend_id $dev_id > /sys/bus/pci/drivers/vfio-mdev-pci/new_id
> >
> > d) check the supported mdev instances
> > > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/
> > vfio-mdev-pci-type1
> > > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\
> > vfio-mdev-pci-type1/
> > available_instances create device_api devices name
>
>
> I think the static type name here is a problem (and why does it
> include "type1"?). We generally consider that a type defines a
> software compatible mdev, but in this case any PCI device wrapped in
> vfio-mdev-pci gets the same mdev type. This is only a sample driver,
> but that's a bad precedent. I've taken a stab at fixing this in the
> patch below, using the PCI vendor ID, device ID, subsystem vendor ID,
> subsystem device ID, class code, and revision to try to make the type
> as specific to the physical device assigned as we can through PCI.

Thanks, it is much better than what I proposed.

>
> >
> > e) create mdev on this physical device (only 1 instance)
> > > echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1003" > \
> > /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\
> > vfio-mdev-pci-type1/create
>
> Whoops, available_instances always reports 1 and it doesn't appear that
> the create function prevents additional mdevs. Also addressed in the
> patch below.

yep, thanks.

>
> > f) passthru the mdev to guest
> > add the following line in Qemu boot command
> > -device vfio-pci,\
> > sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003
> >
> > g) destroy mdev
> > > echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003/\
> > remove
> >
>
> I also found that unbinding the parent device doesn't unregister with
> mdev, so it cannot be bound again, also fixed below.

Oops, good catch. :-)

> However, the patch below just makes the mdev interface behave
> correctly, I can't make it work on my system because commit
> 7bd50f0cd2fd ("vfio/type1: Add domain at(de)taching group helpers")

What error did you encounter. I tested the patch with a device in a
singleton iommu group. I'm also searching a proper machine with
multiple devices in an iommu group and test it.

> used iommu_attach_device() rather than iommu_attach_group() for non-aux
> mdev iommu_device. Is there a requirement that the mdev parent device
> is in a singleton iommu group?

I don't think there should have such limitation. Per my understanding,
vfio-mdev-pci should also be able to bind to devices which shares
iommu group with other devices. vfio-pci works well for such devices.
And since the two drivers share most of the codes, I think vfio-mdev-pci
should naturally support it as well.

> If this is a simplification, then
> vfio-mdev-pci should not bind to devices where this is violated since
> there's no way to use the device. Can we support it though?

yeah, I think we need to support it.

> If I have two devices in the same group and bind them both to
> vfio-mdev-pci, I end up with three groups, one for each mdev device and
> the original physical device group. vfio.c works with the mdev groups
> and will try to match both groups to the container. vfio_iommu_type1.c
> also works with the mdev groups, except for the point where we actually
> try to attach a group to a domain, which is the only window where we use
> the iommu_device rather than the provided group, but we don't record
> that anywhere. Should struct vfio_group have a pointer to a reference
> counted object that tracks the actual iommu_group attached, such that
> we can determine that the group is already attached to the domain and
> not try to attach again?

Agreed, we need to avoid such duplicated attach. Instead of adding
reference counted object in vfio_group. I'm also considering the logic
below:

/*
* Do this check in vfio_iommu_type1_attach_group(), after mdev_group
* is initialized.
*/
if (vfio_group->mdev_group) {
/*
* vfio_group->mdev_group is true means vfio_group->iommu_group
* is not the actual iommu_group which is going to be attached to
* domain. To avoid duplicate iommu_group attach, needs to check if
* the actual iommu_group. vfio_get_parent_iommu_group() is a
* newly added helper function which returns the actual attach
* iommu_group going to be attached for this mdev group.
*/
p_iommu_group = vfio_get_parent_iommu_group(
vfio_group->iommu_group);
list_for_each_entry(d, &iommu->domain_list, next) {
if (find_iommu_group(d, p_iommu_group)) {
mutex_unlock(&iommu->lock);
// skip group attach;
}
}

> Ideally I'd be able to bind one device to
> vfio-pci, the other to vfio-mdev-pci, and be able to use them both
> within the same container. It seems like this should be possible, it's
> the same effective iommu configuration as if they were both bound to
> vfio-pci. Thanks,

Agreed. Will test it. And thanks for the fix patch below. I've test it
with a device in a singleton iommu group. Need to test the scenario
you mentioned above. :-)

Thanks,
Yi Liu

>
> Alex
>
> diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c
> index 07c8067b3f73..09143d3e5473 100644
> --- a/drivers/vfio/pci/vfio_mdev_pci.c
> +++ b/drivers/vfio/pci/vfio_mdev_pci.c
> @@ -65,18 +65,22 @@ MODULE_PARM_DESC(disable_idle_d3,
>
> static struct pci_driver vfio_mdev_pci_driver;
>
> -static ssize_t
> -name_show(struct kobject *kobj, struct device *dev, char *buf)
> -{
> - return sprintf(buf, "%s-type1\n", dev_name(dev));
> -}
> -
> -MDEV_TYPE_ATTR_RO(name);
> +struct vfio_mdev_pci_device {
> + struct vfio_pci_device vdev;
> + struct mdev_parent_ops ops;
> + struct attribute_group *groups[2];
> + struct attribute_group attr;
> + atomic_t avail;
> +};
>
> static ssize_t
> available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
> {
> - return sprintf(buf, "%d\n", 1);
> + struct vfio_mdev_pci_device *vmdev;
> +
> + vmdev = pci_get_drvdata(to_pci_dev(dev));
> +
> + return sprintf(buf, "%d\n", atomic_read(&vmdev->avail));
> }
>
> MDEV_TYPE_ATTR_RO(available_instances);
> @@ -90,62 +94,57 @@ static ssize_t device_api_show(struct kobject *kobj, struct
> device *dev,
> MDEV_TYPE_ATTR_RO(device_api);
>
> static struct attribute *vfio_mdev_pci_types_attrs[] = {
> - &mdev_type_attr_name.attr,
> &mdev_type_attr_device_api.attr,
> &mdev_type_attr_available_instances.attr,
> NULL,
> };
>
> -static struct attribute_group vfio_mdev_pci_type_group1 = {
> - .name = "type1",
> - .attrs = vfio_mdev_pci_types_attrs,
> -};
> -
> -struct attribute_group *vfio_mdev_pci_type_groups[] = {
> - &vfio_mdev_pci_type_group1,
> - NULL,
> -};
> -
> struct vfio_mdev_pci {
> struct vfio_pci_device *vdev;
> struct mdev_device *mdev;
> - unsigned long handle;
> };
>
> static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev)
> {
> struct device *pdev;
> - struct vfio_pci_device *vdev;
> + struct vfio_mdev_pci_device *vmdev;
> struct vfio_mdev_pci *pmdev;
> int ret;
>
> pdev = mdev_parent_dev(mdev);
> - vdev = dev_get_drvdata(pdev);
> + vmdev = dev_get_drvdata(pdev);
> +
> + if (atomic_dec_if_positive(&vmdev->avail) < 0)
> + return -ENOSPC;
> +
> pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL);
> - if (pmdev == NULL) {
> - ret = -EBUSY;
> - goto out;
> - }
> + if (!pmdev)
> + return -ENOMEM;
>
> pmdev->mdev = mdev;
> - pmdev->vdev = vdev;
> + pmdev->vdev = &vmdev->vdev;
> mdev_set_drvdata(mdev, pmdev);
> ret = mdev_set_iommu_device(mdev_dev(mdev), pdev);
> if (ret) {
> pr_info("%s, failed to config iommu isolation for mdev: %s on
> pf: %s\n",
> __func__, dev_name(mdev_dev(mdev)), dev_name(pdev));
> - goto out;
> + kfree(pmdev);
> + atomic_inc(&vmdev->avail);
> + return ret;
> }
>
> -out:
> - return ret;
> + return 0;
> }
>
> static int vfio_mdev_pci_remove(struct mdev_device *mdev)
> {
> struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev);
> + struct vfio_mdev_pci_device *vmdev;
> +
> + vmdev = container_of(pmdev->vdev, struct vfio_mdev_pci_device, vdev);
>
> kfree(pmdev);
> + atomic_inc(&vmdev->avail);
> pr_info("%s, succeeded for mdev: %s\n", __func__,
> dev_name(mdev_dev(mdev)));
>
> @@ -237,24 +236,12 @@ static ssize_t vfio_mdev_pci_write(struct mdev_device
> *mdev,
> return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos);
> }
>
> -static const struct mdev_parent_ops vfio_mdev_pci_ops = {
> - .supported_type_groups = vfio_mdev_pci_type_groups,
> - .create = vfio_mdev_pci_create,
> - .remove = vfio_mdev_pci_remove,
> -
> - .open = vfio_mdev_pci_open,
> - .release = vfio_mdev_pci_release,
> -
> - .read = vfio_mdev_pci_read,
> - .write = vfio_mdev_pci_write,
> - .mmap = vfio_mdev_pci_mmap,
> - .ioctl = vfio_mdev_pci_ioctl,
> -};
> -
> static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
> {
> + struct vfio_mdev_pci_device *vmdev;
> struct vfio_pci_device *vdev;
> + const struct mdev_parent_ops *ops;
> int ret;
>
> if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> @@ -273,10 +260,38 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev
> *pdev,
> return -EBUSY;
> }
>
> - vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
> - if (!vdev)
> + vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL);
> + if (!vmdev)
> return -ENOMEM;
>
> + vmdev->attr.name = kasprintf(GFP_KERNEL,
> + "%04x:%04x:%04x:%04x:%06x:%02x",
> + pdev->vendor, pdev->device,
> + pdev->subsystem_vendor,
> + pdev->subsystem_device, pdev->class,
> + pdev->revision);
> + if (!vmdev->attr.name) {
> + kfree(vmdev);
> + return -ENOMEM;
> + }
> +
> + atomic_set(&vmdev->avail, 1);
> +
> + vmdev->attr.attrs = vfio_mdev_pci_types_attrs;
> + vmdev->groups[0] = &vmdev->attr;
> +
> + vmdev->ops.supported_type_groups = vmdev->groups;
> + vmdev->ops.create = vfio_mdev_pci_create;
> + vmdev->ops.remove = vfio_mdev_pci_remove;
> + vmdev->ops.open = vfio_mdev_pci_open;
> + vmdev->ops.release = vfio_mdev_pci_release;
> + vmdev->ops.read = vfio_mdev_pci_read;
> + vmdev->ops.write = vfio_mdev_pci_write;
> + vmdev->ops.mmap = vfio_mdev_pci_mmap;
> + vmdev->ops.ioctl = vfio_mdev_pci_ioctl;
> + ops = &vmdev->ops;
> +
> + vdev = &vmdev->vdev;
> vdev->pdev = pdev;
> vdev->irq_type = VFIO_PCI_NUM_IRQS;
> mutex_init(&vdev->igate);
> @@ -289,7 +304,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
> #endif
> vdev->disable_idle_d3 = disable_idle_d3;
>
> - pci_set_drvdata(pdev, vdev);
> + pci_set_drvdata(pdev, vmdev);
>
> ret = vfio_pci_reflck_attach(vdev);
> if (ret) {
> @@ -320,7 +335,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev,
> vfio_pci_set_power_state(vdev, PCI_D3hot);
> }
>
> - ret = mdev_register_device(&pdev->dev, &vfio_mdev_pci_ops);
> + ret = mdev_register_device(&pdev->dev, ops);
> if (ret)
> pr_err("Cannot register mdev for device %s\n",
> dev_name(&pdev->dev));
> @@ -332,12 +347,17 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev
> *pdev,
>
> static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev)
> {
> + struct vfio_mdev_pci_device *vmdev;
> struct vfio_pci_device *vdev;
>
> - vdev = pci_get_drvdata(pdev);
> - if (!vdev)
> + mdev_unregister_device(&pdev->dev);
> +
> + vmdev = pci_get_drvdata(pdev);
> + if (!vmdev)
> return;
>
> + vdev = &vmdev->vdev;
> +
> vfio_pci_reflck_put(vdev->reflck);
>
> kfree(vdev->region);
> @@ -355,7 +375,8 @@ static void vfio_mdev_pci_driver_remove(struct pci_dev
> *pdev)
> VGA_RSRC_LEGACY_IO |
> VGA_RSRC_LEGACY_MEM);
> }
>
> - kfree(vdev);
> + kfree(vmdev->attr.name);
> + kfree(vmdev);
> }
>
> static struct pci_driver vfio_mdev_pci_driver = {