Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

From: Jason Wang
Date: Tue Sep 15 2020 - 23:11:21 EST

Next message: Waiman Long: "Re: [PATCH v11 4/5] locking/qspinlock: Introduce starvation avoidance into CNA"
Previous message: Miles Chen: "[RESEND PATCHv1] proc: use untagged_addr() for pagemap_read addresses"
In reply to: Kishon Vijay Abraham I: "Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:

Hi Jason,

On 15/09/20 1:48 pm, Jason Wang wrote:

Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
    /* Virtqueue ops */
    int (*set_vq_address)(struct vdpa_device *vdev,
                  u16 idx, u64 desc_area, u64 driver_area,
                  u64 device_area);
    void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
    void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
    void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
              struct vdpa_callback *cb);
    void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
    bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
    int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
                const struct vdpa_vq_state *state);
    int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
                struct vdpa_vq_state *state);
    struct vdpa_notification_area
    (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
    /* vq irq is not expected to be changed once DRIVER_OK is set */
    int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

    /* Device ops */
    u32 (*get_vq_align)(struct vdpa_device *vdev);
    u64 (*get_features)(struct vdpa_device *vdev);
    int (*set_features)(struct vdpa_device *vdev, u64 features);
    void (*set_config_cb)(struct vdpa_device *vdev,
                  struct vdpa_callback *cb);
    u16 (*get_vq_num_max)(struct vdpa_device *vdev);
    u32 (*get_device_id)(struct vdpa_device *vdev);
    u32 (*get_vendor_id)(struct vdpa_device *vdev);
    u8 (*get_status)(struct vdpa_device *vdev);
    void (*set_status)(struct vdpa_device *vdev, u8 status);
    void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
               void *buf, unsigned int len);
    void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
               const void *buf, unsigned int len);
    u32 (*get_generation)(struct vdpa_device *vdev);

    /* DMA ops */
    int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
    int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
               u64 pa, u32 perm);
    int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

    /* Free device resources */
    void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+              vhost_vq_callback_t *callbacks[],
+              const char * const names[]);
+    void (*del_vqs)(struct vhost_dev *vdev);
+    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
int len);
+    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
len);
+    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+    int (*set_status)(struct vhost_dev *vdev, u8 status);
+    u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and providing
userspace address for vring.

Sorry for being unclear. What I meant is not replacing vDPA with the
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
with vDPA in:

Okay, so the virtio back-end still use vhost and front end should use
vDPA. I see. So the host side PCI driver for EPF should populate
vdpa_config_ops and invoke vdpa_register_device().

Yes.

My question is basically for the part of virtio_pci_epf_send_command(),
so it looks to me you have a vendor specific API to replace the
virtio-pci layout of the BAR:

Even when we use vDPA, we have to use some sort of
virtio_pci_epf_send_command() to communicate with virtio backend right?

Right.

Right, the layout is slightly different from the standard layout.

This is the layout
struct epf_vhost_reg_queue {
u8 cmd;
u8 cmd_status;
u16 status;
u16 num_buffers;
u16 msix_vector;
u64 queue_addr;

What's the meaning of queue_addr here?

Does not mean the device expects a contiguous memory for avail/desc/used ring?

} __packed;

struct epf_vhost_reg {
u64 host_features;
u64 guest_features;
u16 msix_config;
u16 num_queues;
u8 device_status;
u8 config_generation;
u32 isr;
u8 cmd;
u8 cmd_status;
struct epf_vhost_reg_queue vq[MAX_VQS];
} __packed;

+static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
+                       u32 command)
+{
+    struct virtio_pci_epf *pci_epf;
+    void __iomem *ioaddr;
+    ktime_t timeout;
+    bool timedout;
+    int ret = 0;
+    u8 status;
+
+    pci_epf = to_virtio_pci_epf(vp_dev);
+    ioaddr = vp_dev->ioaddr;
+
+    mutex_lock(&pci_epf->lock);
+    writeb(command, ioaddr + HOST_CMD);
+    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
+    while (1) {
+        timedout = ktime_after(ktime_get(), timeout);
+        status = readb(ioaddr + HOST_CMD_STATUS);
+

Several questions:

- It's not clear to me how the synchronization is done between the RC
and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.

The HOST_CMD (commands sent to the EP) is serialized by using mutex.
Once the EP reads the command, it resets the value in HOST_CMD. So
HOST_CMD is less likely an issue.

Here's my understanding of the protocol:

1) RC write to HOST_CMD
2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY

It looks to me what EP should do is

1) EP reset HOST_CMD after reading new command

And it looks to me EP should also reset HOST_CMD_STATUS here?

(I thought there should be patch to handle stuffs like this but I didn't find it in this series)

A sufficiently large time is given for the EP to complete it's operation
(1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
be case where EP updates HOST_CMD_STATUS after RC writes
HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
failure and error-ed out.

If you still want to introduce a new transport, a virtio spec patch
would be helpful for us to understand the device API.

Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?

Yes.

- You have you vendor specific layout (according to
virtio_pci_epb_table()), so I guess you it's better to have a vendor
specific vDPA driver instead

Okay, with vDPA, we are free to define our own layouts.

Right, but vDPA have other requirements. E.g it requires the device have the ability to save/restore the state (e.g the last_avail_idx).

So it actually depends on what you want. If you don't care about userspace drivers and want to have a standard transport, you can still go virtio.

- The advantage of vendor specific vDPA driver is that it can 1) have
less codes 2) support userspace drivers through vhost-vDPA (instead of
inventing new APIs since we can't use vfio-pci here).

I see there's an additional level of indirection from virtio to vDPA and
probably no need for spec update but don't exactly see how it'll reduce
code.

AFAIK you don't need to implement your own setup_vq and del_vq.

For 2, Isn't vhost-vdpa supposed to run on virtio backend?

Not currently, vDPA is a superset of virtio (e.g it support virtqueue state save/restore). This it should be possible in the future probably.

From a high level, I think I should be able to use vDPA for
virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
functionality).

I think it's your call. If you want

1) a well-defined standard virtio transport
2) willing to finalize d and maintain the spec
3) doesn't care about userspace drivers

You can go with virtio, otherwise vDPA.

Thanks

Thanks
Kishon

Next message: Waiman Long: "Re: [PATCH v11 4/5] locking/qspinlock: Introduce starvation avoidance into CNA"
Previous message: Miles Chen: "[RESEND PATCHv1] proc: use untagged_addr() for pagemap_read addresses"
In reply to: Kishon Vijay Abraham I: "Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]