Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend

From: Jason Wang
Date: Wed Sep 18 2019 - 01:51:45 EST



On 2019/9/17 äå6:58, Tiwei Bie wrote:
On Tue, Sep 17, 2019 at 11:32:03AM +0800, Jason Wang wrote:
On 2019/9/17 äå9:02, Tiwei Bie wrote:
This RFC is to demonstrate below ideas,

a) Build vhost-mdev on top of the same abstraction defined in
the virtio-mdev series [1];

b) Introduce /dev/vhost-mdev to do vhost ioctls and support
setting mdev device as backend;

Now the userspace API looks like this:

- Userspace generates a compatible mdev device;

- Userspace opens this mdev device with VFIO API (including
doing IOMMU programming for this mdev device with VFIO's
container/group based interface);

- Userspace opens /dev/vhost-mdev and gets vhost fd;

- Userspace uses vhost ioctls to setup vhost (userspace should
do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
fd first before doing other vhost ioctls);

Only compile test has been done for this series for now.

Have a hard thought on the architecture:
Thanks a lot! Do appreciate it!

1) Create a vhost char device and pass vfio mdev device fd to it as a
backend and translate vhost-mdev ioctl to virtio mdev transport (e.g
read/write). DMA was done through the VFIO DMA mapping on the container that
is attached.
Yeah, that's what we are doing in this series.

We have two more choices:

2) Use vfio-mdev but do not create vhost-mdev device, instead, just
implement vhost ioctl on vfio_device_ops, and translate them into
virtio-mdev transport or just pass ioctl to parent.
Yeah. Instead of introducing /dev/vhost-mdev char device, do
vhost ioctls on VFIO device fd directly. That's what we did
in RFC v3.

3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe still
try to add dev to vfio group and talk to parent with device specific ops
If my understanding is correct, this means we need to introduce
a new VFIO device driver to replace the existing vfio-mdev driver
in our case. Below is a quick draft just to show my understanding:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/vfio.h>
#include <linux/mdev.h>

#include "mdev_private.h"

/* XXX: we need a proper way to include below vhost header. */
#include "../../vhost/vhost.h"

static int vfio_vhost_mdev_open(void *device_data)
{
if (!try_module_get(THIS_MODULE))
return -ENODEV;

/* ... */
vhost_dev_init(...);

return 0;
}

static void vfio_vhost_mdev_release(void *device_data)
{
/* ... */
module_put(THIS_MODULE);
}

static long vfio_vhost_mdev_unlocked_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
{
struct mdev_device *mdev = device_data;
struct mdev_parent *parent = mdev->parent;

/*
* Use vhost ioctls.
*
* We will have a different parent_ops design.
* And potentially, we can share the same parent_ops
* with virtio_mdev.
*/
switch (cmd) {
case VHOST_GET_FEATURES:
parent->ops->get_features(mdev, ...);
break;
/* ... */
}

return 0;
}

static ssize_t vfio_vhost_mdev_read(void *device_data, char __user *buf,
size_t count, loff_t *ppos)
{
/* ... */
return 0;
}

static ssize_t vfio_vhost_mdev_write(void *device_data, const char __user *buf,
size_t count, loff_t *ppos)
{
/* ... */
return 0;
}

static int vfio_vhost_mdev_mmap(void *device_data, struct vm_area_struct *vma)
{
/* ... */
return 0;
}

static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
.name = "vfio-vhost-mdev",
.open = vfio_vhost_mdev_open,
.release = vfio_vhost_mdev_release,
.ioctl = vfio_vhost_mdev_unlocked_ioctl,
.read = vfio_vhost_mdev_read,
.write = vfio_vhost_mdev_write,
.mmap = vfio_vhost_mdev_mmap,
};

static int vfio_vhost_mdev_probe(struct device *dev)
{
struct mdev_device *mdev = to_mdev_device(dev);

/* ... */
return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
}

static void vfio_vhost_mdev_remove(struct device *dev)
{
/* ... */
vfio_del_group_dev(dev);
}

static struct mdev_driver vfio_vhost_mdev_driver = {
.name = "vfio_vhost_mdev",
.probe = vfio_vhost_mdev_probe,
.remove = vfio_vhost_mdev_remove,
};

static int __init vfio_vhost_mdev_init(void)
{
return mdev_register_driver(&vfio_vhost_mdev_driver, THIS_MODULE);
}
module_init(vfio_vhost_mdev_init)

static void __exit vfio_vhost_mdev_exit(void)
{
mdev_unregister_driver(&vfio_vhost_mdev_driver);
}
module_exit(vfio_vhost_mdev_exit)


Yes, something like this basically.


So I have some questions:

1) Compared to method 2, what's the advantage of creating a new vhost char
device? I guess it's for keep the API compatibility?
One benefit is that we can avoid doing vhost ioctls on
VFIO device fd.


Yes, but any benefit from doing this?



2) For method 2, is there any easy way for user/admin to distinguish e.g
ordinary vfio-mdev for vhost from ordinary vfio-mdev?
I think device-api could be a choice.


Ok.



I saw you introduce
ops matching helper but it's not friendly to management.
The ops matching helper is just to check whether a given
vfio-device is based on a mdev device.

3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
assumes the parameter comes from userspace, it prevents support kernel
virtio drivers.

4) So comes the idea of method 3, since it register a new vhost-mdev driver,
we can use device specific ops instead of VFIO ones, then we can have a
common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
As the above draft shows, this requires introducing a new
VFIO device driver. I think Alex's opinion matters here.


Yes, it is.

Thanks


Thanks,
Tiwei

What's your thoughts?

Thanks


RFCv3: https://patchwork.kernel.org/patch/11117785/

[1] https://lkml.org/lkml/2019/9/10/135

Tiwei Bie (3):
vfio: support getting vfio device from device fd
vfio: support checking vfio driver by device ops
vhost: introduce mdev based hardware backend

drivers/vfio/mdev/vfio_mdev.c | 3 +-
drivers/vfio/vfio.c | 32 +++
drivers/vhost/Kconfig | 9 +
drivers/vhost/Makefile | 3 +
drivers/vhost/mdev.c | 462 +++++++++++++++++++++++++++++++
drivers/vhost/vhost.c | 39 ++-
drivers/vhost/vhost.h | 6 +
include/linux/vfio.h | 11 +
include/uapi/linux/vhost.h | 10 +
include/uapi/linux/vhost_types.h | 5 +
10 files changed, 573 insertions(+), 7 deletions(-)
create mode 100644 drivers/vhost/mdev.c