Performance regressions in networking & storage benchmarks in Linux kernel 5.8

From: Abdul Anshad Azeez
Date: Tue Sep 22 2020 - 04:51:12 EST

Part of VMware's performance regression testing for Linux Kernel upstream rele
ases we compared Linux kernel 5.8 against 5.7. Our evaluation revealed perform
ance regressions mostly in networking latency/response-time benchmarks up to 6
0%. Storage throughput & latency benchmarks were also up by 8%.

After performing the bisect between kernel 5.8 and 5.7, we identified the root
cause behaviour to be an interrupt related change from Thomas Gleixner's "633
260fa143bbed05e65dc557a492667dfdc45bb(x86/irq: Convey vector as argument and n
ot in ptregs)" commit. To confirm this, we backed out the commit from 5.8 & re
ran our tests and found that the performance was similar to 5.7 kernel.

Impacted test cases:

- Netperf TCP_RR & TCP_CRR - Response time
- Ping - Response time
- Memcache - Response time
- Netperf TCP_STREAM small(8K socket & 256B message)(TCP_NODELAY set) pack
ets - Throughput & CPU utilization(CPU/Gbits)

- FIO:
- 4K (rand|seq)_(read|write) local-NVMe MultiVM tests - Throughput & l

>From our testing, overall results indicate that above-mentioned commit has int
roduced performance regressions in latency-sensitive workloads for networking.
For storage, it affected both throughput & latency workloads.

Also, since Linux 5.9-rc4 kernel was released recently, we repeated the same e
xperiments on 5.9-rc4. We observed all regressions were fixed and the performa
nce numbers between 5.7 and 5.9-rc4 were similar.

In order to find the fix commit, we bisected again between 5.8 and 5.9-rc4 and
identified that regressions were fixed from a commit made by the same author
Thomas Gleixner, which unbreaks the interrupt affinity settings - "e027fffff79
9cdd70400c5485b1a54f482255985(x86/irq: Unbreak interrupt affinity setting)".

We believe these findings would be useful to the Linux community and wanted to
document the same.

Abdul Anshad Azeez
Performance Engineering
VMware, Inc.