Re: [PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct

From: kernel test robot
Date: Wed Sep 06 2023 - 04:26:47 EST




Hello,

kernel test robot noticed a -8.2% improvement of phoronix-test-suite.osbench.LaunchPrograms.us_per_event on:


commit: 9d32938c115580bfff128a926d704199d2f33ba3 ("[PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct")
url: https://github.com/intel-lab-lkp/linux/commits/Mateusz-Guzik/pcpcntr-add-group-allocation-free/20230823-130803
base: https://git.kernel.org/cgit/linux/kernel/git/dennis/percpu.git for-next
patch link: https://lore.kernel.org/all/20230823050609.2228718-3-mjguzik@xxxxxxxxx/
patch subject: [PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct

testcase: phoronix-test-suite
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
parameters:

test: osbench-1.0.2
option_a: Launch Programs
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230906/202309061504.7e645826-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Launch Programs/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite

commit:
1db50472c8 ("pcpcntr: add group allocation/free")
9d32938c11 ("kernel/fork: group allocation/free of per-cpu counters for mm struct")

1db50472c8bc1d34 9d32938c115580bfff128a926d7
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.00 +33.3% 4.00 vmstat.procs.r
14111 +5.7% 14918 vmstat.system.cs
2114 +1.1% 2136 turbostat.Bzy_MHz
1.67 +0.2 1.83 turbostat.C1E%
121.98 +5.1% 128.24 turbostat.PkgWatt
98.05 -8.2% 90.02 phoronix-test-suite.osbench.LaunchPrograms.us_per_event
16246 ± 4% +6.1% 17243 phoronix-test-suite.time.involuntary_context_switches
9791476 +9.2% 10689455 phoronix-test-suite.time.minor_page_faults
311.33 +9.3% 340.33 phoronix-test-suite.time.percent_of_cpu_this_job_got
83.40 ± 2% +9.2% 91.07 ± 2% phoronix-test-suite.time.system_time
151333 +8.6% 164355 phoronix-test-suite.time.voluntary_context_switches
3225 -5.5% 3046 ± 5% proc-vmstat.nr_page_table_pages
9150454 +8.0% 9884178 proc-vmstat.numa_hit
9088660 +8.7% 9882518 proc-vmstat.numa_local
9971116 +8.3% 10802925 proc-vmstat.pgalloc_normal
10202032 +8.8% 11099649 proc-vmstat.pgfault
9845338 +8.4% 10676360 proc-vmstat.pgfree
207049 +10.3% 228380 ± 8% proc-vmstat.pgreuse
1.947e+09 +5.0% 2.045e+09 perf-stat.i.branch-instructions
52304206 +4.4% 54610501 perf-stat.i.branch-misses
9.06 ± 2% +0.5 9.52 perf-stat.i.cache-miss-rate%
19663522 ± 3% +10.0% 21634645 perf-stat.i.cache-misses
1.658e+08 +3.6% 1.717e+08 perf-stat.i.cache-references
14769 +6.2% 15691 perf-stat.i.context-switches
1.338e+10 +6.2% 1.42e+10 perf-stat.i.cpu-cycles
3112873 ± 3% -12.5% 2724690 ± 3% perf-stat.i.dTLB-load-misses
2.396e+09 +5.5% 2.528e+09 perf-stat.i.dTLB-loads
0.11 ± 4% -0.0 0.10 ± 2% perf-stat.i.dTLB-store-miss-rate%
1003394 ± 6% -14.0% 862768 ± 5% perf-stat.i.dTLB-store-misses
1.25e+09 +6.0% 1.325e+09 perf-stat.i.dTLB-stores
71.16 -1.3 69.88 perf-stat.i.iTLB-load-miss-rate%
1872082 +8.2% 2025999 perf-stat.i.iTLB-loads
9.606e+09 +5.4% 1.012e+10 perf-stat.i.instructions
23.37 ± 5% +30.6% 30.53 ± 4% perf-stat.i.major-faults
0.14 +6.2% 0.15 perf-stat.i.metric.GHz
59.39 +5.4% 62.61 perf-stat.i.metric.M/sec
249517 +10.0% 274572 perf-stat.i.minor-faults
5081285 +6.0% 5385686 ± 4% perf-stat.i.node-load-misses
565117 ± 3% +8.1% 610682 ± 3% perf-stat.i.node-loads
249541 +10.0% 274602 perf-stat.i.page-faults
17.27 -1.7% 16.98 perf-stat.overall.MPKI
11.85 ± 2% +0.7 12.59 perf-stat.overall.cache-miss-rate%
0.13 ± 2% -0.0 0.11 ± 2% perf-stat.overall.dTLB-load-miss-rate%
0.08 ± 7% -0.0 0.07 ± 4% perf-stat.overall.dTLB-store-miss-rate%
67.26 -1.1 66.12 perf-stat.overall.iTLB-load-miss-rate%
1.895e+09 +5.0% 1.99e+09 perf-stat.ps.branch-instructions
50921385 +4.4% 53146828 perf-stat.ps.branch-misses
19140130 ± 3% +10.0% 21047707 perf-stat.ps.cache-misses
1.615e+08 +3.5% 1.672e+08 perf-stat.ps.cache-references
14376 +6.2% 15266 perf-stat.ps.context-switches
1.303e+10 +6.1% 1.383e+10 perf-stat.ps.cpu-cycles
3033019 ± 3% -12.5% 2654269 ± 3% perf-stat.ps.dTLB-load-misses
2.332e+09 +5.5% 2.46e+09 perf-stat.ps.dTLB-loads
976773 ± 6% -14.1% 839517 ± 5% perf-stat.ps.dTLB-store-misses
1.217e+09 +6.0% 1.289e+09 perf-stat.ps.dTLB-stores
1822198 +8.2% 1971115 perf-stat.ps.iTLB-loads
9.349e+09 +5.3% 9.846e+09 perf-stat.ps.instructions
22.75 ± 5% +30.5% 29.69 ± 4% perf-stat.ps.major-faults
242831 +10.0% 267074 perf-stat.ps.minor-faults
4945101 +5.9% 5238638 ± 4% perf-stat.ps.node-load-misses
550029 ± 3% +8.0% 594116 ± 3% perf-stat.ps.node-loads
242854 +10.0% 267104 perf-stat.ps.page-faults
3.719e+11 +4.4% 3.883e+11 perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki