On Wed, Sep 27, 2017 at 08:27:37AM +0800, Jason Wang wrote:
Yes but smp_wmb is a NOP on e.g. x86. We can switch to other types of
On 2017å09æ26æ 21:45, Michael S. Tsirkin wrote:
On Fri, Sep 22, 2017 at 04:02:30PM +0800, Jason Wang wrote:Yes.
Hi:Interesting, thanks for the patches. So IIUC most of the gain is really
This series tries to implement basic tx batched processing. This is
done by prefetching descriptor indices and update used ring in a
batch. This intends to speed up used ring updating and improve the
cache utilization.
overcoming some of the shortcomings of virtio 1.0 wrt cache utilization?
Actually, looks like batching in 1.1 is not as easy as in 1.0.
In 1.0, we could do something like:
batch update used ring by user copy_to_user()
smp_wmb()
update used_idx
In 1.1, we need more memory barriers, can't benefit from fast copy helpers?
for () {
ÂÂÂ update desc.addr
ÂÂÂ smp_wmb()
ÂÂÂ update desc.flag
}
barriers as well.
We do need to do the updates in order, so we might
need new APIs for that to avoid re-doing the translation all the time.
In 1.0 the last update is a cache miss always. You need batching to get
less misses. In 1.1 you don't have it so fundamentally there is less
need for batching. But batching does not always work. DPDK guys (which
batch things aggressively) already tried 1.1 and saw performance gains
so we do not need to argue theoretically.
For sure we might need to change vring_used_elem.Which is fair enough (1.0 is already deployed) but I would like to avoidI think the new APIs do not expose more internal data structure of virtio
making 1.1 support harder, and this patchset does this unfortunately,
than before? (vq->heads has already been used by vhost_net for years).
Consider the layout is re-designed completely, I don't see an easy method toCurrent API just says you get buffers then you use them. It is not tied
reuse current 1.0 API for 1.1.
to actual separate used ring.
Not sure I understand. Did you set napi_tx to true or false?see comments on individual patches. I'm sure it can be addressed though.MoonGen is used in guest for better numbers.
Test shows about ~22% improvement in tx pss.Is this with or without tx napi in guest?
Thanks
Please review.
Jason Wang (5):
vhost: split out ring head fetching logic
vhost: introduce helper to prefetch desc index
vhost: introduce vhost_add_used_idx()
vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH
vhost_net: basic tx virtqueue batched processing
drivers/vhost/net.c | 221 ++++++++++++++++++++++++++++----------------------
drivers/vhost/vhost.c | 165 +++++++++++++++++++++++++++++++------
drivers/vhost/vhost.h | 9 ++
3 files changed, 270 insertions(+), 125 deletions(-)
--
2.7.4