Hi Jason,
On 15/09/20 1:48 pm, Jason Wang wrote:
Hi Kishon:Okay, so the virtio back-end still use vhost and front end should use
On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
Then you need something that is functional equivalent to virtio PCIOkay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).
struct vdpa_config_ops {
/* Virtqueue ops */
int (*set_vq_address)(struct vdpa_device *vdev,
u16 idx, u64 desc_area, u64 driver_area,
u64 device_area);
void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
struct vdpa_callback *cb);
void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
const struct vdpa_vq_state *state);
int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
struct vdpa_vq_state *state);
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
/* vq irq is not expected to be changed once DRIVER_OK is set */
int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
/* Device ops */
u32 (*get_vq_align)(struct vdpa_device *vdev);
u64 (*get_features)(struct vdpa_device *vdev);
int (*set_features)(struct vdpa_device *vdev, u64 features);
void (*set_config_cb)(struct vdpa_device *vdev,
struct vdpa_callback *cb);
u16 (*get_vq_num_max)(struct vdpa_device *vdev);
u32 (*get_device_id)(struct vdpa_device *vdev);
u32 (*get_vendor_id)(struct vdpa_device *vdev);
u8 (*get_status)(struct vdpa_device *vdev);
void (*set_status)(struct vdpa_device *vdev, u8 status);
void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
void *buf, unsigned int len);
void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
const void *buf, unsigned int len);
u32 (*get_generation)(struct vdpa_device *vdev);
/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
u64 pa, u32 perm);
int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
/* Free device resources */
void (*free)(struct vdpa_device *vdev);
};
+struct vhost_config_ops {
+ int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[]);
+ void (*del_vqs)(struct vhost_dev *vdev);
+ int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
int len);
+ int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
len);
+ int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+ int (*set_status)(struct vhost_dev *vdev, u8 status);
+ u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.
I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.
I think vdpa always works with client drivers in userspace and providing
userspace address for vring.
Sorry for being unclear. What I meant is not replacing vDPA with the
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
with vDPA in:
vDPA. I see. So the host side PCI driver for EPF should populate
vdpa_config_ops and invoke vdpa_register_device().
My question is basically for the part of virtio_pci_epf_send_command(),Even when we use vDPA, we have to use some sort of
so it looks to me you have a vendor specific API to replace the
virtio-pci layout of the BAR:
virtio_pci_epf_send_command() to communicate with virtio backend right?
Right, the layout is slightly different from the standard layout.
This is the layout
struct epf_vhost_reg_queue {
u8 cmd;
u8 cmd_status;
u16 status;
u16 num_buffers;
u16 msix_vector;
u64 queue_addr;
} __packed;
struct epf_vhost_reg {
u64 host_features;
u64 guest_features;
u16 msix_config;
u16 num_queues;
u8 device_status;
u8 config_generation;
u32 isr;
u8 cmd;
u8 cmd_status;
struct epf_vhost_reg_queue vq[MAX_VQS];
} __packed;
The HOST_CMD (commands sent to the EP) is serialized by using mutex.
+static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
+ u32 command)
+{
+ struct virtio_pci_epf *pci_epf;
+ void __iomem *ioaddr;
+ ktime_t timeout;
+ bool timedout;
+ int ret = 0;
+ u8 status;
+
+ pci_epf = to_virtio_pci_epf(vp_dev);
+ ioaddr = vp_dev->ioaddr;
+
+ mutex_lock(&pci_epf->lock);
+ writeb(command, ioaddr + HOST_CMD);
+ timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
+ while (1) {
+ timedout = ktime_after(ktime_get(), timeout);
+ status = readb(ioaddr + HOST_CMD_STATUS);
+
Several questions:
- It's not clear to me how the synchronization is done between the RC
and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
Once the EP reads the command, it resets the value in HOST_CMD. So
HOST_CMD is less likely an issue.
A sufficiently large time is given for the EP to complete it's operation
(1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
be case where EP updates HOST_CMD_STATUS after RC writes
HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
failure and error-ed out.
If you still want to introduce a new transport, a virtio spec patchOkay, that should be on https://github.com/oasis-tcs/virtio-spec.git?
would be helpful for us to understand the device API.
- You have you vendor specific layout (according toOkay, with vDPA, we are free to define our own layouts.
virtio_pci_epb_table()), so I guess you it's better to have a vendor
specific vDPA driver instead
- The advantage of vendor specific vDPA driver is that it can 1) haveI see there's an additional level of indirection from virtio to vDPA and
less codes 2) support userspace drivers through vhost-vDPA (instead of
inventing new APIs since we can't use vfio-pci here).
probably no need for spec update but don't exactly see how it'll reduce
code.
For 2, Isn't vhost-vdpa supposed to run on virtio backend?
From a high level, I think I should be able to use vDPA for
virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
functionality).
Thanks
Kishon