Re: [RFC v4 3/5] virtio_ring: add packed ring support

From: Tiwei Bie
Date: Fri May 18 2018 - 06:33:13 EST


On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> On 2018å05æ16æ 22:33, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > On 2018å05æ16æ 21:45, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > On 2018å05æ16æ 20:39, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > On 2018å05æ16æ 16:37, Tiwei Bie wrote:
> > [...]
> > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > + unsigned int id, void **ctx)
> > > > > > > > +{
> > > > > > > > + struct vring_packed_desc *desc;
> > > > > > > > + unsigned int i, j;
> > > > > > > > +
> > > > > > > > + /* Clear data ptr. */
> > > > > > > > + vq->desc_state[id].data = NULL;
> > > > > > > > +
> > > > > > > > + i = head;
> > > > > > > > +
> > > > > > > > + for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > + desc = &vq->vring_packed.desc[i];
> > > > > > > > + vring_unmap_one_packed(vq, desc);
> > > > > > > As mentioned in previous discussion, this probably won't work for the case
> > > > > > > of out of order completion since it depends on the information in the
> > > > > > > descriptor ring. We probably need to extend ctx to record such information.
> > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > >
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > > >
> > > > I got your point now. I think it makes sense to reserve
> > > > the bits of the addr field. Driver shouldn't try to get
> > > > addrs from the descriptors when cleanup the descriptors
> > > > no matter whether we support out-of-order or not.
> > > Maybe I was wrong, but I remember spec mentioned something like this.
> > You're right. Spec mentioned this. I was just repeating
> > the spec to emphasize that it does make sense. :)
> >
> > > > But combining it with the out-of-order support, it will
> > > > mean that the driver still needs to maintain a desc/ctx
> > > > list that is very similar to the desc ring in the split
> > > > ring. I'm not quite sure whether it's something we want.
> > > > If it is true, I'll do it. So do you think we also want
> > > > to maintain such a desc/ctx list for packed ring?
> > > To make it work for OOO backends I think we need something like this
> > > (hardware NIC drivers are usually have something like this).
> > Which hardware NIC drivers have this?
>
> It's quite common I think, e.g driver track e.g dma addr and page frag
> somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field). But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
* descriptor offset can be deduced from the CQE index instead of
* reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.

I guess maybe storage device may want OOO.

Best regards,
Tiwei Bie

>
> Thanks
>
> >
> > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> > > much more simpler to be started with.
> > +1
> >
> > Best regards,
> > Tiwei Bie
>