Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

From: Tiwei Bie
Date: Wed Jul 03 2019 - 07:54:13 EST


On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> On 2019/7/3 äå5:13, Tiwei Bie wrote:
> > Details about this can be found here:
> >
> > https://lwn.net/Articles/750770/
> >
> > What's new in this version
> > ==========================
> >
> > A new VFIO device type is introduced - vfio-vhost. This addressed
> > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> >
> > Below is the updated device interface:
> >
> > Currently, there are two regions of this device: 1) CONFIG_REGION
> > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > can be used to notify the device.
> >
> > 1. CONFIG_REGION
> >
> > The region described by CONFIG_REGION is the main control interface.
> > Messages will be written to or read from this region.
> >
> > The message type is determined by the `request` field in message
> > header. The message size is encoded in the message header too.
> > The message format looks like this:
> >
> > struct vhost_vfio_op {
> > __u64 request;
> > __u32 flags;
> > /* Flag values: */
> > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > __u32 size;
> > union {
> > __u64 u64;
> > struct vhost_vring_state state;
> > struct vhost_vring_addr addr;
> > } payload;
> > };
> >
> > The existing vhost-kernel ioctl cmds are reused as the message
> > requests in above structure.
>
>
> Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..

> I believe either of the following should be better:
>
> - using vhost ioctl, we can start from SET_VRING_KICK/SET_VRING_CALL and
> extend it with e.g notify region. The advantages is that all exist userspace
> program could be reused without modification (or minimal modification). And
> vhost API hides lots of details that is not necessary to be understood by
> application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

>
> - using PCI layout, then you don't even need to re-invent notifiy region at
> all and we can pass-through them to guest.

Like what you said previously, virtio has transports other than PCI.
And it will look a bit odd when using transports other than PCI..

>
> Personally, I prefer vhost ioctl.

+1

>
>
> >
[...]
> >
> > 3. VFIO interrupt ioctl API
> >
> > VFIO interrupt ioctl API is used to setup device interrupts.
> > IRQ-bypass can also be supported.
> >
> > Currently, the data path interrupt can be configured via the
> > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
>
>
> How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> SET_MEM_TABLE DMA can be done at the level of parent device which means it
> can work for e.g the card with on-chip IOMMU.

Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.

>
> And what's the plan for vIOMMU?

As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.

>
>
> >
> > Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
> > ---
> > drivers/vhost/Makefile | 2 +
> > drivers/vhost/vdpa.c | 770 +++++++++++++++++++++++++++++++++++++
> > include/linux/vdpa_mdev.h | 72 ++++
> > include/uapi/linux/vfio.h | 19 +
> > include/uapi/linux/vhost.h | 25 ++
> > 5 files changed, 888 insertions(+)
> > create mode 100644 drivers/vhost/vdpa.c
> > create mode 100644 include/linux/vdpa_mdev.h
>
>
> We probably need some sample parent device implementation. It could be a
> software datapath like e.g we can start from virtio-net device in guest or a
> vhost/tap on host.

Yeah, something like this would be interesting!

Thanks,
Tiwei

>
> Thanks
>
>
> >