[linus:master] [kernel/fork] 14ef95be6f: phoronix-test-suite.osbench.LaunchPrograms.us_per_event -8.2% improvement

From: kernel test robot
Date: Tue Sep 12 2023 - 09:53:00 EST




Hello,

kernel test robot noticed a -8.2% improvement of phoronix-test-suite.osbench.LaunchPrograms.us_per_event on:


commit: 14ef95be6f5558fb9e43aaf06ef9a1d6e0cae6c8 ("kernel/fork: group allocation/free of per-cpu counters for mm struct")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: phoronix-test-suite
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
parameters:

test: osbench-1.0.2
option_a: Launch Programs
cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230912/202309122106.b440c4c6-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Launch Programs/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite

commit:
c439d5e8a0 ("pcpcntr: add group allocation/free")
14ef95be6f ("kernel/fork: group allocation/free of per-cpu counters for mm struct")

c439d5e8a0deb731 14ef95be6f5558fb9e43aaf06ef
---------------- ---------------------------
%stddev %change %stddev
\ | \
5222 ± 19% -22.2% 4060 ± 16% numa-meminfo.node1.Active(anon)
1.69 +0.1 1.80 ± 2% turbostat.C1E%
14072 +4.1% 14642 vmstat.system.cs
1306 ± 19% -22.3% 1014 ± 16% numa-vmstat.node1.nr_active_anon
1306 ± 19% -22.3% 1014 ± 16% numa-vmstat.node1.nr_zone_active_anon
98.32 -8.2% 90.23 phoronix-test-suite.osbench.LaunchPrograms.us_per_event
9835435 +8.4% 10659372 phoronix-test-suite.time.minor_page_faults
314.33 +7.1% 336.67 phoronix-test-suite.time.percent_of_cpu_this_job_got
83.25 +9.8% 91.44 ± 3% phoronix-test-suite.time.system_time
151162 +8.7% 164239 phoronix-test-suite.time.voluntary_context_switches
9116125 +7.9% 9839611 proc-vmstat.numa_hit
9115159 +7.7% 9818723 proc-vmstat.numa_local
8183 ± 5% -36.5% 5197 ± 70% proc-vmstat.numa_pages_migrated
9972768 +7.9% 10764494 proc-vmstat.pgalloc_normal
10251204 +8.1% 11080823 proc-vmstat.pgfault
9845664 +8.0% 10637337 proc-vmstat.pgfree
8183 ± 5% -36.5% 5197 ± 70% proc-vmstat.pgmigrate_success
207326 +7.0% 221825 proc-vmstat.pgreuse
5.18 ± 13% -0.9 4.32 ± 19% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
9.03 ± 23% -2.8 6.21 ± 20% perf-profile.children.cycles-pp.asm_exc_page_fault
7.18 ± 27% -2.5 4.70 ± 19% perf-profile.children.cycles-pp.exc_page_fault
7.02 ± 27% -2.4 4.59 ± 17% perf-profile.children.cycles-pp.do_user_addr_fault
1.60 ± 11% -1.1 0.47 ± 75% perf-profile.children.cycles-pp.wp_page_copy
0.78 ± 38% -0.5 0.33 ± 34% perf-profile.children.cycles-pp.__mmdrop
0.49 ± 46% -0.4 0.14 ±111% perf-profile.children.cycles-pp.wake_up_new_task
0.76 ± 29% -0.5 0.21 ± 83% perf-profile.self.cycles-pp.copy_mc_fragile
0.10 ±101% +0.2 0.29 ± 32% perf-profile.self.cycles-pp.kmem_cache_free_bulk
19578013 ± 3% +7.1% 20966194 ± 2% perf-stat.i.cache-misses
1.648e+08 +3.3% 1.702e+08 perf-stat.i.cache-references
14793 +3.9% 15372 perf-stat.i.context-switches
3.13 +6.0% 3.32 ± 3% perf-stat.i.cpi
1.34e+10 +5.1% 1.408e+10 perf-stat.i.cpu-cycles
2995824 ± 3% -6.1% 2812561 ± 2% perf-stat.i.dTLB-load-misses
2.399e+09 +3.9% 2.493e+09 perf-stat.i.dTLB-loads
1.255e+09 +3.9% 1.303e+09 perf-stat.i.dTLB-stores
1908762 +3.4% 1973692 perf-stat.i.iTLB-loads
9.658e+09 +2.8% 9.931e+09 perf-stat.i.instructions
0.58 -3.8% 0.56 perf-stat.i.ipc
22.17 ± 5% +33.6% 29.63 ± 7% perf-stat.i.major-faults
0.14 +5.1% 0.15 perf-stat.i.metric.GHz
59.57 +3.4% 61.62 perf-stat.i.metric.M/sec
251853 +6.9% 269294 ± 2% perf-stat.i.minor-faults
549142 ± 3% +9.6% 601791 ± 2% perf-stat.i.node-loads
833543 +6.0% 883652 ± 2% perf-stat.i.node-stores
251875 +6.9% 269324 ± 2% perf-stat.i.page-faults
11.87 ± 2% +0.4 12.31 perf-stat.overall.cache-miss-rate%
1.39 +2.2% 1.42 perf-stat.overall.cpi
0.12 ± 3% -0.0 0.11 ± 3% perf-stat.overall.dTLB-load-miss-rate%
2511 +1.5% 2549 perf-stat.overall.instructions-per-iTLB-miss
0.72 -2.1% 0.71 perf-stat.overall.ipc
81.09 -1.4 79.72 perf-stat.overall.node-store-miss-rate%
19054112 ± 3% +7.1% 20408724 ± 2% perf-stat.ps.cache-misses
1.605e+08 +3.4% 1.658e+08 perf-stat.ps.cache-references
14397 +3.9% 14963 perf-stat.ps.context-switches
1.305e+10 +5.1% 1.371e+10 perf-stat.ps.cpu-cycles
2918655 ± 3% -6.1% 2741385 ± 2% perf-stat.ps.dTLB-load-misses
2.334e+09 +4.0% 2.427e+09 perf-stat.ps.dTLB-loads
1.221e+09 +3.9% 1.269e+09 perf-stat.ps.dTLB-stores
1857710 +3.4% 1921204 perf-stat.ps.iTLB-loads
9.399e+09 +2.8% 9.666e+09 perf-stat.ps.instructions
21.57 ± 5% +33.7% 28.83 ± 7% perf-stat.ps.major-faults
245071 +6.9% 262088 ± 2% perf-stat.ps.minor-faults
534454 ± 2% +9.6% 585755 ± 2% perf-stat.ps.node-loads
811200 +6.0% 860097 ± 2% perf-stat.ps.node-stores
245093 +6.9% 262117 ± 2% perf-stat.ps.page-faults
3.723e+11 +4.2% 3.878e+11 perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki