Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput

From: Jason Wang
Date: Mon May 13 2019 - 05:35:21 EST



On 2019/5/10 äå8:58, Stefano Garzarella wrote:
While I was testing this new series (v2) I discovered an huge use of memory
and a memory leak in the virtio-vsock driver in the guest when I sent
1-byte packets to the guest.

These issues are present since the introduction of the virtio-vsock
driver. I added the patches 1 and 2 to fix them in this series in order
to better track the performance trends.

v1: https://patchwork.kernel.org/cover/10885431/

v2:
- Add patch 1 to limit the memory usage
- Add patch 2 to avoid memory leak during the socket release
- Add patch 3 to fix locking of fwd_cnt and buf_alloc
- Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
- Patch 5: Avoid integer underflow of iov_len [Stefan]
- Patch 5: Fix packet capture in order to see the exact packets that are
delivered. [Stefan]
- Add patch 8 to make the RX buffer size tunable [Stefan]

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support.
As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
added a column with virtio-net+vhost-net performance.

A brief description of patches:
- Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
- Patches 3+4: fix locking and reduce the number of credit update messages sent
to the transmitter
- Patches 5+6: allow the host to split packets on multiple buffers and use
VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
- Patches 7+8: increase RX buffer size to 64 KiB

host -> guest [Gbps]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.068 0.063 0.130 0.131 0.128 0.188 0.187
256 0.274 0.236 0.392 0.338 0.282 0.749 0.654
512 0.531 0.457 0.862 0.725 0.602 1.419 1.414
1K 0.954 0.827 1.591 1.598 1.548 2.599 2.640
2K 1.783 1.543 3.731 3.637 3.469 4.530 4.754
4K 3.332 3.436 7.164 7.124 6.494 7.738 7.696
8K 5.792 5.530 11.653 11.787 11.444 12.307 11.850
16K 8.405 8.462 16.372 16.855 17.562 16.936 16.954
32K 14.208 13.669 18.945 20.009 23.128 21.980 23.015
64K 21.082 18.893 20.266 20.903 30.622 27.290 27.383
128K 20.696 20.148 20.112 21.746 32.152 30.446 30.990
256K 20.801 20.589 20.725 22.685 34.721 33.151 32.745
512K 21.220 20.465 20.432 22.106 34.496 36.847 31.096

guest -> host [Gbps]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.089 0.091 0.120 0.115 0.117 0.274 0.272
256 0.352 0.354 0.452 0.445 0.451 1.085 1.136
512 0.705 0.704 0.893 0.858 0.898 2.131 1.882
1K 1.394 1.433 1.721 1.669 1.691 3.984 3.576
2K 2.818 2.874 3.316 3.249 3.303 6.719 6.359
4K 5.293 5.397 6.129 5.933 6.082 10.105 9.860
8K 8.890 9.151 10.990 10.545 10.519 15.239 14.868
16K 11.444 11.018 12.074 15.255 15.577 20.551 20.848
32K 11.229 10.875 10.857 24.401 25.227 26.294 26.380
64K 10.832 10.545 10.816 39.487 39.616 34.996 32.041
128K 10.435 10.241 10.500 39.813 40.012 38.379 35.055
256K 10.263 9.866 9.845 34.971 35.143 36.559 37.232
512K 10.224 10.060 10.092 35.469 34.627 34.963 33.401

As Stefan suggested in the v1, this time I measured also the efficiency in this
way:
efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.94 0.59 3.96 4.06 4.09 2.82 2.11
256 2.62 2.50 6.45 6.09 5.81 9.64 8.73
512 5.16 4.87 13.16 12.39 11.67 17.83 17.76
1K 9.16 8.85 24.98 24.97 25.01 32.57 32.04
2K 17.41 17.03 49.09 48.59 49.22 55.31 57.14
4K 32.99 33.62 90.80 90.98 91.72 91.79 91.40
8K 58.51 59.98 153.53 170.83 167.31 137.51 132.85
16K 89.32 95.29 216.98 264.18 260.95 176.05 176.05
32K 152.94 167.10 285.75 387.02 360.81 215.49 226.30
64K 250.38 307.20 317.65 489.53 472.70 238.97 244.27
128K 327.99 335.24 335.76 523.71 486.41 253.29 260.86
256K 327.06 334.24 338.64 533.76 509.85 267.78 266.22
512K 337.36 330.61 334.95 512.90 496.35 280.42 241.43

guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.90 0.91 1.37 1.32 1.35 2.15 2.13
256 3.59 3.55 5.23 5.19 5.29 8.50 8.89
512 7.19 7.08 10.21 9.95 10.38 16.74 14.71
1K 14.15 14.34 19.85 19.06 19.33 31.44 28.11
2K 28.44 29.09 37.78 37.18 37.49 53.07 50.63
4K 55.37 57.60 71.02 69.27 70.97 81.56 79.32
8K 105.58 100.45 111.95 124.68 123.61 120.85 118.66
16K 141.63 138.24 137.67 187.41 190.20 160.43 163.00
32K 147.56 143.09 138.48 296.41 301.04 214.64 223.94
64K 144.81 143.27 138.49 433.98 462.26 298.86 269.71
128K 150.14 147.99 146.85 511.36 514.29 350.17 298.09
256K 156.69 152.25 148.69 542.19 549.97 326.42 333.32
512K 157.29 153.35 152.22 546.52 533.24 315.55 302.27

[1] https://github.com/stefano-garzarella/iperf/


Hi:

Do you have any explanation that vsock is better here? Is this because of the mergeable buffer? If you, we need test with mrg_rxbuf=off.

Thanks



Stefano Garzarella (8):
vsock/virtio: limit the memory used per-socket
vsock/virtio: free packets during the socket release
vsock/virtio: fix locking for fwd_cnt and buf_alloc
vsock/virtio: reduce credit update messages
vhost/vsock: split packets to send using multiple buffers
vsock/virtio: change the maximum packet size allowed
vsock/virtio: increase RX buffer size to 64 KiB
vsock/virtio: make the RX buffer size tunable

drivers/vhost/vsock.c | 53 +++++++--
include/linux/virtio_vsock.h | 14 ++-
net/vmw_vsock/virtio_transport.c | 28 ++++-
net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
4 files changed, 190 insertions(+), 49 deletions(-)