Re: [PATCH RFC 2/2] virtio_ring: support packed ring

From: Jason Wang
Date: Fri Mar 16 2018 - 07:37:03 EST




On 2018å03æ16æ 18:04, Tiwei Bie wrote:
On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
On 2018å03æ16æ 15:40, Tiwei Bie wrote:
On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
On 2018å03æ16æ 14:10, Tiwei Bie wrote:
On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
On 2018å02æ23æ 19:18, Tiwei Bie wrote:
Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
---
drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
include/linux/virtio_ring.h | 8 +-
2 files changed, 618 insertions(+), 89 deletions(-)


[...]

+ }
+ }
+ for (; n < (out_sgs + in_sgs); n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
+ if (vring_mapping_error(vq, addr))
+ goto unmap_release;
+
+ flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+ VRING_DESC_F_WRITE |
+ VRING_DESC_F_AVAIL(vq->wrap_counter) |
+ VRING_DESC_F_USED(!vq->wrap_counter));
+ if (!indirect && i == head)
+ head_flags = flags;
+ else
+ desc[i].flags = flags;
+
+ desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+ desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+ desc[i].id = cpu_to_virtio32(_vq->vdev, head);
+ prev = i;
+ i++;
+ if (!indirect && i >= vq->vring_packed.num) {
+ i = 0;
+ vq->wrap_counter ^= 1;
+ }
+ }
+ }
+ /* Last one doesn't continue. */
+ if (!indirect && (head + 1) % vq->vring_packed.num == i)
+ head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
I can't get the why we need this here.
If only one desc is used, we will need to clear the
VRING_DESC_F_NEXT flag from the head_flags.
Yes, I meant why following desc[prev].flags won't work for this?
Because the update of desc[head].flags (in above case,
prev == head) has been delayed. The flags is saved in
head_flags.
Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
counter.

And I see lots of duplication in the above two loops, I believe we can unify
them with a a single loop. the only difference is dma direction and write
flag.
The above implementation for packed ring is basically
an mirror of the existing implementation in split ring
as I want to keep the coding style consistent. Below
is the corresponding code in split ring:

static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
unsigned int out_sgs,
unsigned int in_sgs,
void *data,
void *ctx,
gfp_t gfp)
{
......

for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
if (vring_mapping_error(vq, addr))
goto unmap_release;

desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
prev = i;
i = virtio16_to_cpu(_vq->vdev, desc[i].next);
}
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
if (vring_mapping_error(vq, addr))
goto unmap_release;

desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
prev = i;
i = virtio16_to_cpu(_vq->vdev, desc[i].next);
}
}

......
}

There's no need for such consistency especially consider it's a new kind of ring. Anyway, you can stick to it.

[...]

+ } else
+ vq->free_head = i;
ID is only valid in the last descriptor in the list, so head + 1 should be
ok too?
I don't really get your point. The vq->free_head stores
the index of the next avail desc.
I think I get your idea now, free_head has two meanings:

- next avail index
- buffer id
In my design, free_head is just the index of the next
avail desc.

Driver can set anything to buffer ID.
Then you need another method to track id to context e.g hashing.
I keep the context in desc_state[desc_idx]. So there is
no extra method needed to track the context.

Well, it works for this patch but my reply was for "set anything to buffer ID". The size of desc_state is limited, so in fact you can't use a value greater than vq.num.




[...]

@@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
if (!queue) {
/* Try to get a single page. You are my only hope! */
- queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+ queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+ packed),
&dma_addr, GFP_KERNEL|__GFP_ZERO);
}
if (!queue)
return NULL;
- queue_size_in_bytes = vring_size(num, vring_align);
- vring_init(&vring, num, queue, vring_align);
+ queue_size_in_bytes = __vring_size(num, vring_align, packed);
+ if (packed)
+ vring_packed_init(&vring.vring_packed, num, queue, vring_align);
+ else
+ vring_init(&vring.vring_split, num, queue, vring_align);
Let's rename vring_init to vring_init_split() like other helpers?
The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
I don't think we can rename it.
I see, then this need more thoughts to unify the API.
My thought is to keep the old API as is, and introduce
new types and helpers for packed ring.
I admit it's not a fault of this patch. But we'd better think of this in the
future, consider we may have new kinds of ring.

More details can be found in this patch:
https://lkml.org/lkml/2018/2/23/243
(PS. The type which has bit fields is just for reference,
and will be changed in next version.)

Do you have any other suggestions?
No.
Hmm.. Sorry, I didn't describe my question well.
I mean do you have any suggestions about the API
design for packed ring in uapi header? Currently
I introduced below two new helpers:

static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
void *p, unsigned long align);
static inline unsigned vring_packed_size(unsigned int num, unsigned long align);

When new rings are introduced in the future, above
helpers can't be reused. Maybe we should make the
helpers be able to determine the ring type?

Let's wait for Michael's comment here. Generally, I fail to understand why vring_init() become a part of uapi. Git grep shows the only use cases are virtio_test/vringh_test.

Thanks


Best regards,
Tiwei Bie

Thanks

Best regards,
Tiwei Bie

- vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
- notify, callback, name);
+ vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+ context, notify, callback, name);
if (!vq) {
vring_free_queue(vdev, queue_size_in_bytes, queue,
dma_addr);
[...]