On Fri, Apr 14, 2017 at 03:52:23PM +0800, Jason Wang wrote:
Fascinating. How do we explain the gain with batch dequeue?
On 2017å04æ12æ 16:03, Jason Wang wrote:
Still helpful:
On 2017å04æ07æ 13:49, Michael S. Tsirkin wrote:
A known weakness in ptr_ring design is that it does not handle well theThe patch looks good to me, will have a test for vhost batching patches.
situation when ring is almost full: as entries are consumed they are
immediately used again by the producer, so consumer and producer are
writing to a shared cache line.
To fix this, add batching to consume calls: as entries are
consumed do not write NULL into the ring until we get
a multiple (in current implementation 2x) of cache lines
away from the producer. At that point, write them all out.
We do the write out in the reverse order to keep
producer from sharing cache with consumer for as long
as possible.
Writeout also triggers when ring wraps around - there's
no special reason to do this but it helps keep the code
a bit simpler.
What should we do if getting away from producer by 2 cache lines
would mean we are keeping the ring moe than half empty?
Maybe we should reduce the batching in this case,
current patch simply reduces the batching.
Notes:
- it is no longer true that a call to consume guarantees
that the following call to produce will succeed.
No users seem to assume that.
- batching can also in theory reduce the signalling rate:
users that would previously send interrups to the producer
to wake it up after consuming each entry would now only
need to do this once in a batch.
Doing this would be easy by returning a flag to the caller.
No users seem to do signalling on consume yet so this was not
implemented yet.
Signed-off-by: Michael S. Tsirkin<mst@xxxxxxxxxx>
---
Jason, I am curious whether the following gives you some of
the performance boost that you see with vhost batching
patches. Is vhost batching on top still helpful?
Thanks
before this patch: 1.84Mpps
with this patch: 2.00Mpps
with batch dequeuing: 2.30Mpps
Acked-by: Jason Wang <jasowang@xxxxxxxxxx>
Thanks
Is it just the lock overhead?
Can you pls try to replace
the lock with a simple non-fair atomic and see what happens?