OpenWrt / MIPS benchmark with MGLRU

From: Yu Zhao
Date: Wed Aug 31 2022 - 00:18:08 EST


TLDR
====
RAM utilization Throughput (95% CI) P99 Latency (95% CI)
----------------------------------------------------------
~90% NS NS
~110% +[12, 16]% -[20, 22]%

Abbreviations
=============
CI: confidence interval
NS: no statistically significant difference
DUT: device under test
ATE: automatic test equipment

Rational
========
1. OpenWrt is the most popular distro for WiFi routers; many of its
targets use big endianness [1].
2. 4 out of the top 5 bestselling WiFi routers in the US use MIPS [2];
MIPS uses software-managed TLB.
3. Memcached is the best available memory benchmark on OpenWrt;
admittedly such a use case is very limited in the real world.

Hardware
========
DUT: Ubiquiti EdgeRouter (ER-8) [3]

DUT # cat /proc/cpuinfo
system type : UBNT_E200 (CN6120p1.1-800-NSP)
machine : Unknown
processor : 0
cpu model : Cavium Octeon II V0.1
BogoMIPS : 1600.00
wait instruction : yes
microsecond timers : yes
tlb_entries : 128
extra interrupt vector : yes
hardware watchpoint : yes, count: 2, address/irw mask: [0x0ffc, 0x0ffb]
isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2
ASEs implemented :
Options implemented : tlb rixiex 4kex octeon_cache 32fpr prefetch mcheck ejtag llsc rixi lpa vtag_icache userlocal perf_cntr_intr_bit perf
shadow register sets : 1
kscratch registers : 3
package : 0
core : 0
VCED exceptions : not available
VCEI exceptions : not available

processor : 1
cpu model : Cavium Octeon II V0.1
BogoMIPS : 1600.00
wait instruction : yes
microsecond timers : yes
tlb_entries : 128
extra interrupt vector : yes
hardware watchpoint : yes, count: 2, address/irw mask: [0x0ffc, 0x0ffb]
isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2
ASEs implemented :
Options implemented : tlb rixiex 4kex octeon_cache 32fpr prefetch mcheck ejtag llsc rixi lpa vtag_icache userlocal perf_cntr_intr_bit perf
shadow register sets : 1
kscratch registers : 3
package : 0
core : 1
VCED exceptions : not available
VCEI exceptions : not available

DUT # cat /proc/meminfo
MemTotal: 1991964 kB
MemFree: 1917304 kB
MemAvailable: 1896856 kB
Buffers: 4 kB
Cached: 33464 kB
SwapCached: 0 kB
Active: 1316 kB
Inactive: 33500 kB
Active(anon): 1316 kB
Inactive(anon): 33496 kB
Active(file): 0 kB
Inactive(file): 4 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 995324 kB
SwapFree: 995324 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 1360 kB
Mapped: 2688 kB
Shmem: 33464 kB
KReclaimable: 8244 kB
Slab: 19772 kB
SReclaimable: 8244 kB
SUnreclaim: 11528 kB
KernelStack: 1056 kB
PageTables: 336 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1991304 kB
Committed_AS: 38916 kB
VmallocTotal: 1069547512 kB
VmallocUsed: 4856 kB
VmallocChunk: 0 kB
Percpu: 272 kB

Software
========
DUT # cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='22.03.0-rc6'
DISTRIB_REVISION='r19590-042d558536'
DISTRIB_TARGET='octeon/generic'
DISTRIB_ARCH='mips64_octeonplus'
DISTRIB_DESCRIPTION='OpenWrt 22.03.0-rc6 r19590-042d558536'
DISTRIB_TAINTS='no-all no-ipv6'

DUT # uname -a
Linux OpenWrt 6.0.0-rc3+ #0 SMP Sun Jul 31 15:12:47 2022 mips64 GNU/Linux

DUT # cat /proc/swaps
Filename Type Size Used Priority
/dev/zram0 partition 995324 0 100

DUT # memcached -V
memcached 1.6.9

DUT # cat /etc/config/memcached
config memcached
option user 'memcached'
option maxconn '1024'
option listen '0.0.0.0'
option port '11211'
option memory '6400'

ATE $ memtier_benchmark -v
memtier_benchmark 1.3.0
Copyright (C) 2011-2022 Redis Ltd.
This is free software. You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Procedure
=========
ATE $ cat run_benchmark_matrix.sh
run_memtier_benchmark()
{
# boot to kernel $3

# populate dataset
memtier_benchmark/memtier_benchmark -s $DUT_IP -p 11211 \
-P memcache_binary -n allkeys -c 1 --ratio 1:0 --pipeline 8 \
--key-minimum=1 --key-maximum=$2 --key-pattern=P:P \
-d 1000

# access dataset using Guassian pattern
memtier_benchmark/memtier_benchmark -s $DUT_IP -p 11211 \
-P memcache_binary --test-time $1 -c 1 --ratio 0:1 \
--pipeline 8 --key-minimum=1 --key-maximum=$2 \
--key-pattern=G:G --randomize --distinct-client-seed

# collect results
}

run_duration_secs=1200
mem_utils_90_110=(1600000 2000000)
kernels=("baseline" "patched")

for mem_util in ${mem_utils_90_110[@]}; do
for kernel in ${kernels[@]}; do
run_memtier_benchmark $run_duration_secs $mem_util $kernel
done
done

Results
=======
Baseline 90% RAM utilization
------------------------------------------------------------
Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec
------------------------------------------------------------
48550.71 0.65687 0.48700 2.84700 5.56700 1812.25
48600.55 0.65629 0.48700 2.86300 5.59900 1814.11
48562.37 0.65674 0.48700 2.84700 5.50300 1812.68
48556.66 0.65688 0.48700 2.84700 5.53500 1812.47
48619.50 0.65600 0.48700 2.87900 5.63100 1814.82
48579.74 0.65654 0.48700 2.84700 5.56700 1813.33
48593.25 0.65764 0.48700 2.86300 5.56700 1814.10
48535.52 0.65716 0.48700 2.86300 5.56700 1811.68
48587.24 0.65645 0.48700 2.83100 5.50300 1813.61
48541.92 0.65704 0.48700 2.81500 5.47100 1811.92

MGLRU 90% RAM utilization
------------------------------------------------------------
Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec
------------------------------------------------------------
48622.38 0.65594 0.48700 2.81500 5.47100 1814.92
48537.74 0.65715 0.48700 2.84700 5.53500 1811.76
48586.82 0.65646 0.48700 2.84700 5.50300 1813.59
48552.44 0.65695 0.48700 2.83100 5.43900 1812.31
48557.35 0.65680 0.49500 2.83100 5.53500 1812.49
48625.48 0.65593 0.48700 2.81500 5.43900 1815.04
48655.75 0.65557 0.48700 2.84700 5.53500 1816.17
48625.67 0.65595 0.48700 2.84700 5.53500 1815.04
48622.22 0.65600 0.48700 2.84700 5.47100 1814.91
48617.10 0.65610 0.48700 2.84700 5.56700 1814.73

Baseline 110% RAM utilization
------------------------------------------------------------
Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec
------------------------------------------------------------
19813.79 1.61245 0.63100 17.79100 31.74300 744.91
20328.29 1.57158 0.62300 17.27900 31.10300 764.25
20104.12 1.58913 0.62300 17.40700 31.10300 755.82
20342.03 1.57053 0.61500 17.27900 30.84700 764.77
19688.05 1.62268 0.62300 17.91900 31.35900 740.18
19607.31 1.62943 0.63900 17.91900 31.23100 737.15
19250.96 1.65963 0.65500 17.91900 31.10300 723.75
20182.79 1.58290 0.63100 17.40700 30.84700 758.78
20181.88 1.58299 0.63100 17.40700 30.84700 758.75
20615.90 1.54963 0.62300 17.02300 30.84700 775.06

MGLRU 110% RAM utilization
------------------------------------------------------------
Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec
------------------------------------------------------------
22911.33 1.39405 0.61500 13.69500 28.79900 861.36
22339.08 1.42989 0.61500 14.07900 30.07900 839.85
23394.22 1.36521 0.59900 13.56700 29.05500 879.51
22521.48 1.41830 0.61500 13.88700 29.82300 846.70
22678.10 1.40818 0.61500 13.82300 29.69500 852.59
22344.50 1.42952 0.61500 14.07900 29.95100 840.05
23245.65 1.37406 0.60700 13.50300 28.92700 873.93
23140.17 1.38032 0.59900 13.69500 29.18300 869.96
23003.34 1.38856 0.61500 13.63100 29.05500 864.82
22937.52 1.39253 0.61500 13.69500 29.43900 862.35

Flame graphs
------------
Baseline: https://drive.google.com/file/d/1-Ac4HMPAyZIqxtvKerUTqNNAgBLhpX9R
MGLRU: https://drive.google.com/file/d/1-9x0W2yIYeiRvXWiYRzL6niTqW7zCVPX

References
==========
[1] https://openwrt.org/docs/platforms/start
[2] https://www.amazon.com/bestsellers/pc/300189
[3] https://openwrt.org/toh/ubiquiti/edgerouter