On Tue, Sep 03, 2019 at 07:26:03AM -0400, Michael S. Tsirkin wrote:
On Wed, Aug 28, 2019 at 01:37:12PM +0800, Tiwei Bie wrote:I like this idea! I will give it a try.
Details about this can be found here:....
https://lwn.net/Articles/750770/
What's new in this version
==========================
There are three choices based on the discussion [1] in RFC v2:
#1. We expose a VFIO device, so we can reuse the VFIO container/groupThis version is more like a quick PoC to try Jason's proposal on
based DMA API and potentially reuse a lot of VFIO code in QEMU.
But in this case, we have two choices for the VFIO device interface
(i.e. the interface on top of VFIO device fd):
A) we may invent a new vhost protocol (as demonstrated by the code
in this RFC) on VFIO device fd to make it work in VFIO's way,
i.e. regions and irqs.
B) Or as you proposed, instead of inventing a new vhost protocol,
we can reuse most existing vhost ioctls on the VFIO device fd
directly. There should be no conflicts between the VFIO ioctls
(type is 0x3B) and VHOST ioctls (type is 0xAF) currently.
#2. Instead of exposing a VFIO device, we may expose a VHOST device.
And we will introduce a new mdev driver vhost-mdev to do this.
It would be natural to reuse the existing kernel vhost interface
(ioctls) on it as much as possible. But we will need to invent
some APIs for DMA programming (reusing VHOST_SET_MEM_TABLE is a
choice, but it's too heavy and doesn't support vIOMMU by itself).
reusing vhost ioctls. And the second way (#1/B) in above three
choices was chosen in this version to demonstrate the idea quickly.
Now the userspace API looks like this:
- VFIO's container/group based IOMMU API is used to do the
DMA programming.
- Vhost's existing ioctls are used to setup the device.
And the device will report device_api as "vfio-vhost".
Note that, there are dirty hacks in this version. If we decide to
go this way, some refactoring in vhost.c/vhost.h may be needed.
PS. The direct mapping of the notify registers isn't implemented
in this version.
[1] https://lkml.org/lkml/2019/7/9/101
Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
+long vhost_mdev_ioctl(struct mdev_device *mdev, unsigned int cmd,
+ unsigned long arg)
+{
+ void __user *argp = (void __user *)arg;
+ struct vhost_mdev *vdpa;
+ unsigned long minsz;
+ int ret = 0;
+
+ if (!mdev)
+ return -EINVAL;
+
+ vdpa = mdev_get_drvdata(mdev);
+ if (!vdpa)
+ return -ENODEV;
+
+ switch (cmd) {
+ case VFIO_DEVICE_GET_INFO:
+ {
+ struct vfio_device_info info;
+
+ minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+ if (copy_from_user(&info, (void __user *)arg, minsz)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ if (info.argsz < minsz) {
+ ret = -EINVAL;
+ break;
+ }
+
+ info.flags = VFIO_DEVICE_FLAGS_VHOST;
+ info.num_regions = 0;
+ info.num_irqs = 0;
+
+ if (copy_to_user((void __user *)arg, &info, minsz)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ break;
+ }
+ case VFIO_DEVICE_GET_REGION_INFO:
+ case VFIO_DEVICE_GET_IRQ_INFO:
+ case VFIO_DEVICE_SET_IRQS:
+ case VFIO_DEVICE_RESET:
+ ret = -EINVAL;
+ break;
+
+ case VHOST_MDEV_SET_STATE:
+ ret = vhost_set_state(vdpa, argp);
+ break;
+ case VHOST_GET_FEATURES:
+ ret = vhost_get_features(vdpa, argp);
+ break;
+ case VHOST_SET_FEATURES:
+ ret = vhost_set_features(vdpa, argp);
+ break;
+ case VHOST_GET_VRING_BASE:
+ ret = vhost_get_vring_base(vdpa, argp);
+ break;
+ default:
+ ret = vhost_dev_ioctl(&vdpa->dev, cmd, argp);
+ if (ret == -ENOIOCTLCMD)
+ ret = vhost_vring_ioctl(&vdpa->dev, cmd, argp);
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(vhost_mdev_ioctl);
I don't have a problem with this approach. A small question:
would it make sense to have two fds: send vhost ioctls
on one and vfio ioctls on another?
We can then pass vfio fd to the vhost fd with a
SET_BACKEND ioctl.
What do you think?
So we can introduce /dev/vhost-mdev to have the vhost fd,
and let
userspace pass vfio fd to the vhost fd with a SET_BACKEND ioctl.
Thanks a lot!
Tiwei
--
MST