Re: [percpu_ref] 2b0d3d3e4f: reaim.jobs_per_min -18.4% regression

From: Ming Lei
Date: Mon Jan 11 2021 - 05:00:49 EST


On Sun, Jan 10, 2021 at 10:32:47PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -18.4% regression of reaim.jobs_per_min due to commit:
>
>
> commit: 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
>
> in testcase: reaim
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 100%
> test: short
> cpufreq_governor: performance
> ucode: 0x5002f01
>
> test-description: REAIM is an updated and improved version of AIM 7 benchmark.
> test-url: https://sourceforge.net/projects/re-aim-7/
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput -2.8% regression |
> | test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
> | test parameters | cpufreq_governor=performance |
> | | runtime=300s |
> | | test=lru-file-mmap-read-rand |
> | | ucode=0x5003003 |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops 14.5% improvement |
> | test machine | 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory |
> | test parameters | cpufreq_governor=performance |
> | | mode=process |
> | | nr_task=50% |
> | | test=page_fault2 |
> | | ucode=0x16 |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -13.0% regression |
> | test machine | 104 threads Skylake with 192G memory |
> | test parameters | cpufreq_governor=performance |
> | | mode=process |
> | | nr_task=50% |
> | | test=malloc1 |
> | | ucode=0x2006906 |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput -2.3% regression |
> | test machine | 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory |
> | test parameters | cpufreq_governor=performance |
> | | runtime=300s |
> | | test=lru-file-mmap-read-rand |
> | | ucode=0x5002f01 |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | fio-basic: fio.read_iops -4.8% regression |
> | test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
> | test parameters | bs=4k |
> | | cpufreq_governor=performance |
> | | disk=2pmem |
> | | fs=xfs |
> | | ioengine=libaio |
> | | nr_task=50% |
> | | runtime=200s |
> | | rw=randread |
> | | test_size=200G |
> | | time_based=tb |
> | | ucode=0x5002f01 |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.stackmmap.ops_per_sec -45.4% regression |
> | test machine | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory |
> | test parameters | class=memory |
> | | cpufreq_governor=performance |
> | | disk=1HDD |
> | | nr_threads=100% |
> | | testtime=10s |
> | | ucode=0x5002f01 |
> +------------------+---------------------------------------------------------------------------+

Just run a quick test of the last two on 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of
percpu_ref in fast path) and cf785af19319 ("block: warn if !__GFP_DIRECT_RECLAIM in bio_crypt_set_ctx()").

Not see difference in the two kernel(fio on null_blk with 224 hw queues,
and 'stress-ng --stackmmap-ops') on one 224 cores, dual sockets system.

BTW this patch itself doesn't touch fast path code, so it is supposed to
not affect performance.

Can you double check if the test itself is good?

Note: cf785af19319 is 2b0d3d3e4fcf^



Thanks,
Ming