Re: [PATCH 1/3] ptr_ring: batch ring zeroing

From: Michael S. Tsirkin
Date: Fri Apr 14 2017 - 17:00:30 EST


On Fri, Apr 14, 2017 at 03:52:23PM +0800, Jason Wang wrote:
>
>
> On 2017å04æ12æ 16:03, Jason Wang wrote:
> >
> >
> > On 2017å04æ07æ 13:49, Michael S. Tsirkin wrote:
> > > A known weakness in ptr_ring design is that it does not handle well the
> > > situation when ring is almost full: as entries are consumed they are
> > > immediately used again by the producer, so consumer and producer are
> > > writing to a shared cache line.
> > >
> > > To fix this, add batching to consume calls: as entries are
> > > consumed do not write NULL into the ring until we get
> > > a multiple (in current implementation 2x) of cache lines
> > > away from the producer. At that point, write them all out.
> > >
> > > We do the write out in the reverse order to keep
> > > producer from sharing cache with consumer for as long
> > > as possible.
> > >
> > > Writeout also triggers when ring wraps around - there's
> > > no special reason to do this but it helps keep the code
> > > a bit simpler.
> > >
> > > What should we do if getting away from producer by 2 cache lines
> > > would mean we are keeping the ring moe than half empty?
> > > Maybe we should reduce the batching in this case,
> > > current patch simply reduces the batching.
> > >
> > > Notes:
> > > - it is no longer true that a call to consume guarantees
> > > that the following call to produce will succeed.
> > > No users seem to assume that.
> > > - batching can also in theory reduce the signalling rate:
> > > users that would previously send interrups to the producer
> > > to wake it up after consuming each entry would now only
> > > need to do this once in a batch.
> > > Doing this would be easy by returning a flag to the caller.
> > > No users seem to do signalling on consume yet so this was not
> > > implemented yet.
> > >
> > > Signed-off-by: Michael S. Tsirkin<mst@xxxxxxxxxx>
> > > ---
> > >
> > > Jason, I am curious whether the following gives you some of
> > > the performance boost that you see with vhost batching
> > > patches. Is vhost batching on top still helpful?
> >
> > The patch looks good to me, will have a test for vhost batching patches.
> >
> > Thanks
>
> Still helpful:
>
> before this patch: 1.84Mpps
> with this patch: 2.00Mpps
> with batch dequeuing: 2.30Mpps
>
> Acked-by: Jason Wang <jasowang@xxxxxxxxxx>
>
> Thanks

Fascinating. How do we explain the gain with batch dequeue?
Is it just the lock overhead? Can you pls try to replace
the lock with a simple non-fair atomic and see what happens?

--
MST