Re: Network performance regression in Linux kernel 6.6 for small socket size test cases

From: Eric Dumazet
Date: Wed Feb 28 2024 - 03:48:34 EST


On Wed, Feb 28, 2024 at 7:43 AM Abdul Anshad Azeez
<abdul-anshad.azeez@xxxxxxxxxxxx> wrote:
>
> During performance regression workload execution of the Linux
> kernel we observed up to 30% performance decrease in a specific networking
> workload on the 6.6 kernel compared to 6.5 (details below). The regression is
> reproducible in both Linux VMs running on ESXi and bare metal Linux.
>
> Workload details:
>
> Benchmark - Netperf TCP_STREAM
> Socket buffer size - 8K
> Message size - 256B
> MTU - 1500B
> Socket option - TCP_NODELAY
> # of STREAMs - 32
> Direction - Uni-Directional Receive
> Duration - 60 Seconds
> NIC - Mellanox Technologies ConnectX-6 Dx EN 100G
> Server Config - Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz & 512G Memory
>
> Bisect between 6.5 and 6.6 kernel concluded that this regression originated
> from the below commit:
>
> commit - dfa2f0483360d4d6f2324405464c9f281156bd87 (tcp: get rid of
> sysctl_tcp_adv_win_scale)
> Author - Eric Dumazet <edumazet@xxxxxxxxxx>
> Link -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
> dfa2f0483360d4d6f2324405464c9f281156bd87
>
> Performance data for (Linux VM on ESXi):
> Test case - TCP_STREAM_RECV Throughput in Gbps
> (for different socket buffer sizes and with constant message size - 256B):
>
> Socket buffer size - [LK6.5 vs LK6.6]
> 8K - [8.4 vs 5.9 Gbps]
> 16K - [13.4 vs 10.6 Gbps]
> 32K - [19.1 vs 16.3 Gbps]
> 64K - [19.6 vs 19.7 Gbps]
> Autotune - [19.7 vs 19.6 Gbps]
>
> From the above performance data, we can infer that:
> * Regression is specific to lower fixed socket buffer sizes (8K, 16K & 32K).
> * Increasing the socket buffer size gradually decreases the throughput impact.
> * Performance is equal for higher fixed socket size (64K) and Autotune socket
> tests.
>
> We would like to know if there are any opportunities for optimization in
> the test cases with small socket sizes.
>

Sure, I would suggest not setting small SO_RCVBUF values in 2024,
or you get what you ask for (going back to old TCP performance of year 2010 )

Back in 2018, we set tcp_rmem[1] to 131072 for a good reason.

commit a337531b942bd8a03e7052444d7e36972aac2d92
Author: Yuchung Cheng <ycheng@xxxxxxxxxx>
Date: Thu Sep 27 11:21:19 2018 -0700

tcp: up initial rmem to 128KB and SYN rwin to around 64KB


I can not enforce a minimum in SO_RCVBUF (other than the small one added in
commit eea86af6b1e18d6fa8dc959e3ddc0100f27aff9f ("net: sock: adapt
SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF"))
otherwise many test programs will break, expecting to set a low value.