[RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark

From: Yang Shen
Date: Mon Sep 19 2022 - 08:08:09 EST


Add crypto benchmark - A tool to help the users quickly get the
performance of a algorithm registered in crypto.

The tool tries to use the same API to unify the processes of different
algorithms. The algorithm can do some private operations in the callbacks.
For users, they can see the unified configuration parameters, rather than
a set of configuration parameters corresponding to each algorithm.

This tool can provide users with the ability to test the performance of
algorithms in some specific scenarios. At present, the following parameters
are selected for users configuration: block size, block number,
thread number, bound numa and request number for per tfm. These parameters
can help users simulate approximate business scenarios.

For the RFC version, the compression benchmark test is supported.
I did some verification on Kunpeng920.

The first test case is for zlib-deflate software algorithm.
The cpu frequency is 2.6 GHz. I want to show you the influence of these
parameters.

The configuration is following:
run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 1024,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
The result is :
Crypto benchmark result:
throughput pps time
150 MB/s 150 kPP/s 1000 ms

And then change the block size:
run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
473 MB/s 59 kPP/s 1005 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 65536,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
421 MB/s 6 kPP/s 1038 ms

With the test, users can know that the throughput and pps are both
influenced by block size on this server. And the throughput has a peak
value while the pps is inverse ratio with bolck size increasing.
Due to the software algorithm, thread number will linear increase the
result while it is less than cpu number and other parameters have little
influence on performance.

The second test case is for zlib-deflate hardware. The tested parameters
has the same effect on hardware. Here I test the parameter 'reqnum'.
The software algorithm register to synchronous process. So here it is
useless for software performance.

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
367 MB/s 46 kPP/s 941 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 10, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
3507 MB/s 438 kPP/s 1003 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 100, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
6318 MB/s 790 kPP/s 1093 ms

So we can know that for asynchronous algorithms, request number for per
tfm also influence the throughput and pps until a peak value.

So with this tool, we can get a quick verification for different platform
and get some reference for business scenarios configuration.

Yang Shen (6):
moduleparams: Add hexulong type parameter
crypto: benchmark - add a crypto benchmark tool
crytpo: benchmark - support compression/decompresssion
crypto: benchmark - add help information
crypto: benchmark - add API documentation
MAINTAINERS: add crypto benchmark MAINTAINER

Documentation/crypto/benchmark.rst | 104 +++++
MAINTAINERS | 7 +
crypto/Kconfig | 2 +
crypto/Makefile | 5 +
crypto/benchmark/Kconfig | 11 +
crypto/benchmark/Makefile | 3 +
crypto/benchmark/benchmark.c | 599 +++++++++++++++++++++++++++++
crypto/benchmark/benchmark.h | 76 ++++
crypto/benchmark/bm_comp.c | 435 +++++++++++++++++++++
crypto/benchmark/bm_comp.h | 19 +
include/linux/moduleparam.h | 7 +-
kernel/params.c | 1 +
12 files changed, 1268 insertions(+), 1 deletion(-)
create mode 100644 Documentation/crypto/benchmark.rst
create mode 100644 crypto/benchmark/Kconfig
create mode 100644 crypto/benchmark/Makefile
create mode 100644 crypto/benchmark/benchmark.c
create mode 100644 crypto/benchmark/benchmark.h
create mode 100644 crypto/benchmark/bm_comp.c
create mode 100644 crypto/benchmark/bm_comp.h

--
2.24.0