On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
On 2018å04æ01æ 22:12, Tiwei Bie wrote:Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
Hello everyone,[...]
This RFC implements packed ring support for virtio driver.
The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
Minor changes are needed for the vhost code, e.g. to kick the guest.
TODO:
- Refinements and bug fixes;
- Split into small patches;
- Test indirect descriptor support;
- Test/fix event suppression support;
- Test devices other than net;
RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;
Thanks!
Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
---
drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++-------
include/linux/virtio_ring.h | 8 +-
include/uapi/linux/virtio_config.h | 12 +-
include/uapi/linux/virtio_ring.h | 61 ++
4 files changed, 980 insertions(+), 195 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71458f493cf8..0515dca34d77 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,14 +58,15 @@
+It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
+ if (vq->indirect) {
+ u32 len;
+
+ desc = vq->desc_state[head].indir_desc;
+ /* Free the indirect table, if any, now that it's unmapped. */
+ if (!desc)
+ goto out;
+
+ len = virtio32_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[head].len);
+
+ BUG_ON(!(vq->vring_packed.desc[head].flags &
+ cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
can safely remove this BUG_ON() here.
+ BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));Len could be ignored for used descriptor according to the spec, so we need
remove this BUG_ON() too.
And I think something related to this in the spec isn't very
clear currently.
In the spec, there are below words:
https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
"""
In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
is reserved and is ignored by the device.
"""
So when device writes back an used descriptor in this case,
device may not set the VIRTQ_DESC_F_WRITE flag as the flag
is reserved and should be ignored.
https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
"""
Element Length is reserved for used descriptors without the
VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
"""
And this is the way how driver ignores the `len` in an used
descriptor.
https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
"""
To increase ring capacity the driver can store a (read-only
by the device) table of indirect descriptors anywhere in memory,
and insert a descriptor in the main virtqueue (with \field{Flags}
bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
containing this indirect descriptor table;
"""
So the indirect descriptors in the table are read-only by
the device. And the only descriptor which is writeable by
the device is the descriptor in the main virtqueue (with
Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
`len` in this descriptor, we won't be able to get the
length of the data written by the device.
So I think the `len` in this descriptor will carry the
length of the data written by the device (if the buffers
are writable to the device) even if the VIRTQ_DESC_F_WRITE
isn't set by the device. How do you think?
The reason is we don't touch descriptor ring in the case of split, soI don't think we have to track used_wrap_counter in
BUG_ON()s may help there.
+This looks interesting, spec said:
+ for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
+ vring_unmap_one_packed(vq, &desc[j]);
+
+ kfree(desc);
+ vq->desc_state[head].indir_desc = NULL;
+ } else if (ctx) {
+ *ctx = vq->desc_state[head].indir_desc;
+ }
+
+out:
+ return vq->desc_state[head].num;
+}
+
+static inline bool more_used_split(const struct vring_virtqueue *vq)
{
return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
}
+static inline bool more_used_packed(const struct vring_virtqueue *vq)
+{
+ u16 last_used, flags;
+ bool avail, used;
+
+ if (vq->vq.num_free == vq->vring_packed.num)
+ return false;
+
+ last_used = vq->last_used_idx;
+ flags = virtio16_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[last_used].flags);
+ avail = flags & VRING_DESC_F_AVAIL(1);
+ used = flags & VRING_DESC_F_USED(1);
+
+ return avail == used;
+}
"
Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an
available descriptor and
equal for a used descriptor.
Note that this observation is mostly useful for sanity-checking as these are
necessary but not sufficient
conditions - for example, all descriptors are zero-initialized. To detect
used and available descriptors it is
possible for drivers and devices to keep track of the last observed value of
VIRTQ_DESC_F_USED/VIRTQ_-
DESC_F_AVAIL. Other techniques to detect
VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
might also be possible.
"
So it looks to me it was not sufficient, looking at the example codes in
spec, do we need to track last seen used_wrap_counter here?
driver. There was a discussion on this:
https://lists.oasis-open.org/archives/virtio-dev/201802/msg00177.html
And after that, below sentence was added (it's also
in the above words you quoted):
"""
Other techniques to detect
VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
might also be possible.
"""
Best regards,
Tiwei Bie
Thanks