On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
On 2019/7/3 äå5:13, Tiwei Bie wrote:I'm trying to make it work in VFIO's way..
Details about this can be found here:
https://lwn.net/Articles/750770/
What's new in this version
==========================
A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/
Below is the updated device interface:
Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.
1. CONFIG_REGION
The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.
The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:
struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};
The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.
Still a comments like V1. What's the advantage of inventing a new protocol?
I believe either of the following should be better:Do you mean reusing vhost's ioctl on VFIO device fd directly,
- using vhost ioctl, we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?
- using PCI layout, then you don't even need to re-invent notifiy region atLike what you said previously, virtio has transports other than PCI.
all and we can pass-through them to guest.
And it will look a bit odd when using transports other than PCI..
Personally, I prefer vhost ioctl.+1
[...]
Agree. In this RFC, it assumes userspace will use VFIO IOMMU API3. VFIO interrupt ioctl API
VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.
Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
SET_MEM_TABLE DMA can be done at the level of parent device which means it
can work for e.g the card with on-chip IOMMU.
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.
And what's the plan for vIOMMU?As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.
Yeah, something like this would be interesting!
Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
---
drivers/vhost/Makefile | 2 +
drivers/vhost/vdpa.c | 770 +++++++++++++++++++++++++++++++++++++
include/linux/vdpa_mdev.h | 72 ++++
include/uapi/linux/vfio.h | 19 +
include/uapi/linux/vhost.h | 25 ++
5 files changed, 888 insertions(+)
create mode 100644 drivers/vhost/vdpa.c
create mode 100644 include/linux/vdpa_mdev.h
We probably need some sample parent device implementation. It could be a
software datapath like e.g we can start from virtio-net device in guest or a
vhost/tap on host.
Thanks,
Tiwei
Thanks