Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

From: Tiwei Bie
Date: Wed Jul 03 2019 - 09:09:44 EST


On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> On 2019/7/3 äå7:52, Tiwei Bie wrote:
> > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > On 2019/7/3 äå5:13, Tiwei Bie wrote:
> > > > Details about this can be found here:
> > > >
> > > > https://lwn.net/Articles/750770/
> > > >
> > > > What's new in this version
> > > > ==========================
> > > >
> > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> > > >
> > > > Below is the updated device interface:
> > > >
> > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > can be used to notify the device.
> > > >
> > > > 1. CONFIG_REGION
> > > >
> > > > The region described by CONFIG_REGION is the main control interface.
> > > > Messages will be written to or read from this region.
> > > >
> > > > The message type is determined by the `request` field in message
> > > > header. The message size is encoded in the message header too.
> > > > The message format looks like this:
> > > >
> > > > struct vhost_vfio_op {
> > > > __u64 request;
> > > > __u32 flags;
> > > > /* Flag values: */
> > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > __u32 size;
> > > > union {
> > > > __u64 u64;
> > > > struct vhost_vring_state state;
> > > > struct vhost_vring_addr addr;
> > > > } payload;
> > > > };
> > > >
> > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > requests in above structure.
> > >
> > > Still a comments like V1. What's the advantage of inventing a new protocol?
> > I'm trying to make it work in VFIO's way..
> >
> > > I believe either of the following should be better:
> > >
> > > - using vhost ioctl, we can start from SET_VRING_KICK/SET_VRING_CALL and
> > > extend it with e.g notify region. The advantages is that all exist userspace
> > > program could be reused without modification (or minimal modification). And
> > > vhost API hides lots of details that is not necessary to be understood by
> > > application (e.g in the case of container).
> > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > or introducing another mdev driver (i.e. vhost_mdev instead of
> > using the existing vfio_mdev) for mdev device?
>
>
> Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?

>
>
> >
[...]
> > > > 3. VFIO interrupt ioctl API
> > > >
> > > > VFIO interrupt ioctl API is used to setup device interrupts.
> > > > IRQ-bypass can also be supported.
> > > >
> > > > Currently, the data path interrupt can be configured via the
> > > > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
> > >
> > > How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> > > SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> > > SET_MEM_TABLE DMA can be done at the level of parent device which means it
> > > can work for e.g the card with on-chip IOMMU.
> > Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
> > to do the DMA programming. But like what you said, there could be
> > a problem when using cards with on-chip IOMMU.
>
>
> Yes, another issue is SET_MEM_TABLE can not be used to update just a part of
> the table. This seems less flexible than VFIO API but it could be extended.

Agree.

>
>
> >
> > > And what's the plan for vIOMMU?
> > As this RFC assumes userspace will use VFIO IOMMU API, userspace
> > just needs to follow the same way like what vfio-pci device does
> > in QEMU to support vIOMMU.
>
>
> Right, this is more a question for the qemu part. It means it needs to go
> for ordinary VFIO path to get all notifiers/listeners support from vIOMMU.

Yeah.

>
>
> >
> > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
> > > > ---
> > > > drivers/vhost/Makefile | 2 +
> > > > drivers/vhost/vdpa.c | 770 +++++++++++++++++++++++++++++++++++++
> > > > include/linux/vdpa_mdev.h | 72 ++++
> > > > include/uapi/linux/vfio.h | 19 +
> > > > include/uapi/linux/vhost.h | 25 ++
> > > > 5 files changed, 888 insertions(+)
> > > > create mode 100644 drivers/vhost/vdpa.c
> > > > create mode 100644 include/linux/vdpa_mdev.h
> > >
> > > We probably need some sample parent device implementation. It could be a
> > > software datapath like e.g we can start from virtio-net device in guest or a
> > > vhost/tap on host.
> > Yeah, something like this would be interesting!
>
>
> Plan to do something like that :) ?

I don't have a plan yet.. But it's something that can be done I think.

Thanks,
Tiwei

>
> Thanks
>
>
> >
> > Thanks,
> > Tiwei
> >
> > > Thanks
> > >
> > >