Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

From: Jason Wang
Date: Wed Jul 03 2019 - 08:16:43 EST

On 2019/7/3 äå7:52, Tiwei Bie wrote:
On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
On 2019/7/3 äå5:13, Tiwei Bie wrote:
Details about this can be found here:

What's new in this version

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
can be used to notify the device.


The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?
I'm trying to make it work in VFIO's way..

I believe either of the following should be better:

- using vhost ioctl, we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).
Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

- using PCI layout, then you don't even need to re-invent notifiy region at
all and we can pass-through them to guest.
Like what you said previously, virtio has transports other than PCI.
And it will look a bit odd when using transports other than PCI..


Personally, I prefer vhost ioctl.

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.

How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
SET_MEM_TABLE DMA can be done at the level of parent device which means it
can work for e.g the card with on-chip IOMMU.
Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.

Yes, another issue is SET_MEM_TABLE can not be used to update just a part of the table. This seems less flexible than VFIO API but it could be extended.

And what's the plan for vIOMMU?
As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.

Right, this is more a question for the qemu part. It means it needs to go for ordinary VFIO path to get all notifiers/listeners support from vIOMMU.

Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
drivers/vhost/Makefile | 2 +
drivers/vhost/vdpa.c | 770 +++++++++++++++++++++++++++++++++++++
include/linux/vdpa_mdev.h | 72 ++++
include/uapi/linux/vfio.h | 19 +
include/uapi/linux/vhost.h | 25 ++
5 files changed, 888 insertions(+)
create mode 100644 drivers/vhost/vdpa.c
create mode 100644 include/linux/vdpa_mdev.h

We probably need some sample parent device implementation. It could be a
software datapath like e.g we can start from virtio-net device in guest or a
vhost/tap on host.
Yeah, something like this would be interesting!

Plan to do something like that :) ?