On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote:
This patch implements basic batched processing of tx virtqueue bySo where is the speedup coming from? I'd guess the ring is
prefetching desc indices and updating used ring in a batch. For
non-zerocopy case, vq->heads were used for storing the prefetched
indices and updating used ring. It is also a requirement for doing
more batching on top. For zerocopy case and for simplicity, batched
processing were simply disabled by only fetching and processing one
descriptor at a time, this could be optimized in the future.
XDP_DROP (without touching skb) on tun (with Moongen in guest) with
zercopy disabled:
Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:
Before: 3.20Mpps
After: 3.90Mpps (+22%)
No differences were seen with zerocopy enabled.
Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
hot in cache, it's faster to access it in one go, then
pass many packets to net stack. Is that right?
Another possibility is better code cache locality.
So how about this patchset is refactored:
1. use existing APIs just first get packets then
transmit them all then use them all
2. add new APIs and move the loop into vhost core
for more speedups