[rfc net-next v6 0/3] Multiqueue virtio-net

From: Jason Wang
Date: Tue Oct 30 2012 - 06:11:03 EST


Hi all:

This series is an update version of multiqueue virtio-net driver based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the
packets reception and transmission. Please review and comments.

Changes from v5:
- Align the implementation with the RFC spec update v4
- Switch the mode between single mode and multiqueue mode without reset
- Remove the 256 limitation of queues
- Use helpers to do the mapping between virtqueues and tx/rx queues
- Use commbined channels instead of separated rx/tx queus when do the queue
number configuartion
- Other coding style comments from Michael

Reference:
- A protype implementation of qemu-kvm support could by found in
git://github.com/jasowang/qemu-kvm-mq.git
- V5 could be found at http://lwn.net/Articles/505388/
- V4 could be found at https://lkml.org/lkml/2012/6/25/120
- V2 could be found at http://lwn.net/Articles/467283/
- Michael virtio-spec: http://www.spinics.net/lists/netdev/msg209986.html

Perf Numbers:

- Pktgen test shows the receiving capability of the multiqueue virtio-net were
dramatically improved.
- Netperf result shows latency were greately improved according to the test
result. The throughput were kept or improved when transfter with large
packets. But we get regression with small packet (<1500)
transmission/receiving. According to the satistics, TCP tends batch less when mq
is enabled which means much more but smaller pakcets were sent/received whcih
lead much higher cpu utilization and degradate the throughput. In the future,
either tuning of TCP or automatic switch bettwen mq and sq is needed.

Test environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599
- Host/Guest kenrel: net-next with the mq virtio-net patches and mq tuntap
patches

Pktgen test:
- Local host generate 64 byte UDP packet to guest.
- average of 20 runs

20 runs
#q #vcpu kpps +improvement
1q 1vcpu: 264kpps +0%
2q 2vcpu: 451kpps +70%
3q 3vcpu: 661kpps +150%
4q 4vcpu: 941kpps +250%

Netperf Local VM to VM test:
- VM1 and its vcpu/vhost thread in numa node 0
- VM2 and its vcpu/vhost thread in numa node 1
- a script is used to lauch the netperf with demo mode and do the postprocessing
to measure the aggreagte result with the help of timestamp
- average of 3 runs

TCP_RR:
size/session/+lat%/+normalize%
1/ 1/ 0%/ 0%
1/ 10/ +52%/ +6%
1/ 20/ +27%/ +5%
64/ 1/ 0%/ 0%
64/ 10/ +45%/ +4%
64/ 20/ +28%/ +7%
256/ 1/ -1%/ 0%
256/ 10/ +38%/ +2%
256/ 20/ +27%/ +6%
TCP_CRR:
size/session/+lat%/+normalize%
1/ 1/ -7%/ -12%
1/ 10/ +34%/ +3%
1/ 20/ +3%/ -8%
64/ 1/ -7%/ -3%
64/ 10/ +32%/ +1%
64/ 20/ +4%/ -7%
256/ 1/ -6%/ -18%
256/ 10/ +33%/ 0%
256/ 20/ +4%/ -8%
STREAM:
size/session/+thu%/+normalize%
1/ 1/ -3%/ 0%
1/ 2/ -1%/ 0%
1/ 4/ -2%/ 0%
64/ 1/ 0%/ +1%
64/ 2/ -6%/ -6%
64/ 4/ -8%/ -14%
256/ 1/ 0%/ 0%
256/ 2/ -48%/ -52%
256/ 4/ -50%/ -55%
512/ 1/ +4%/ +5%
512/ 2/ -29%/ -33%
512/ 4/ -37%/ -49%
1024/ 1/ +6%/ +7%
1024/ 2/ -46%/ -51%
1024/ 4/ -15%/ -17%
4096/ 1/ +1%/ +1%
4096/ 2/ +16%/ -2%
4096/ 4/ +31%/ -10%
16384/ 1/ 0%/ 0%
16384/ 2/ +16%/ +9%
16384/ 4/ +17%/ -9%

Netperf test between external host and guest over 10gb(ixgbe):
- VM thread and vhost threads were pinned int the node 0
- a script is used to lauch the netperf with demo mode and do the postprocessing
to measure the aggreagte result with the help of timestamp
- average of 3 runs

TCP_RR:
size/session/+lat%/+normalize%
1/ 1/ 0%/ +6%
1/ 10/ +41%/ +2%
1/ 20/ +10%/ -3%
64/ 1/ 0%/ -10%
64/ 10/ +39%/ +1%
64/ 20/ +22%/ +2%
256/ 1/ 0%/ +2%
256/ 10/ +26%/ -17%
256/ 20/ +24%/ +10%
TCP_CRR:
size/session/+lat%/+normalize%
1/ 1/ -3%/ -3%
1/ 10/ +34%/ -3%
1/ 20/ 0%/ -15%
64/ 1/ -3%/ -3%
64/ 10/ +34%/ -3%
64/ 20/ -1%/ -16%
256/ 1/ -1%/ -3%
256/ 10/ +38%/ -2%
256/ 20/ -2%/ -17%
TCP_STREAM:(guest receiving)
size/session/+thu%/+normalize%
1/ 1/ +1%/ +14%
1/ 2/ 0%/ +4%
1/ 4/ -2%/ -24%
64/ 1/ -6%/ +1%
64/ 2/ +1%/ +1%
64/ 4/ -1%/ -11%
256/ 1/ +3%/ +4%
256/ 2/ 0%/ -1%
256/ 4/ 0%/ -15%
512/ 1/ +4%/ 0%
512/ 2/ -10%/ -12%
512/ 4/ 0%/ -11%
1024/ 1/ -5%/ 0%
1024/ 2/ -11%/ -16%
1024/ 4/ +3%/ -11%
4096/ 1/ +27%/ +6%
4096/ 2/ 0%/ -12%
4096/ 4/ 0%/ -20%
16384/ 1/ 0%/ -2%
16384/ 2/ 0%/ -9%
16384/ 4/ +10%/ -2%
TCP_MAERTS:(guest sending)
1/ 1/ -1%/ 0%
1/ 2/ 0%/ 0%
1/ 4/ -5%/ 0%
64/ 1/ 0%/ 0%
64/ 2/ -7%/ -8%
64/ 4/ -7%/ -8%
256/ 1/ 0%/ 0%
256/ 2/ -28%/ -28%
256/ 4/ -28%/ -29%
512/ 1/ 0%/ 0%
512/ 2/ -15%/ -13%
512/ 4/ -53%/ -59%
1024/ 1/ +4%/ +13%
1024/ 2/ -7%/ -18%
1024/ 4/ +1%/ -18%
4096/ 1/ +2%/ 0%
4096/ 2/ +3%/ -19%
4096/ 4/ -1%/ -19%
16384/ 1/ -3%/ -1%
16384/ 2/ 0%/ -12%
16384/ 4/ 0%/ -10%

Jason Wang (2):
virtio_net: multiqueue support
virtio-net: change the number of queues through ethtool

Krishna Kumar (1):
virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE

drivers/net/virtio_net.c | 790 ++++++++++++++++++++++++++++-----------
include/uapi/linux/virtio_net.h | 19 +
2 files changed, 594 insertions(+), 215 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/