[linus:master] [crypto] 996f4dcbd2: stress-ng.sockfd.ops_per_sec 11.0% improvement

From: kernel test robot
Date: Mon May 27 2024 - 04:09:55 EST




Hello,

kernel test robot noticed a 11.0% improvement of stress-ng.sockfd.ops_per_sec on:


commit: 996f4dcbd231ec022f38a3c27e7fc45727e4e875 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
test: sockfd
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240527/202405271558.f424aa27-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockfd/stress-ng/60s

commit:
d637168810 ("crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs")
996f4dcbd2 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation")

d6371688101223a3 996f4dcbd231ec022f38a3c27e7
---------------- ---------------------------
%stddev %change %stddev
\ | \
730290 ± 11% +26.2% 921636 ± 13% meminfo.Mapped
24673 ± 2% +8.1% 26682 ± 3% perf-c2c.HITM.total
61893 +3.6% 64151 vmstat.system.cs
0.28 ± 8% -11.6% 0.25 ± 9% sched_debug.cfs_rq:/.h_nr_running.stddev
196.71 ± 6% -11.8% 173.53 ± 11% sched_debug.cfs_rq:/.util_est.stddev
46304617 +11.0% 51404735 stress-ng.sockfd.ops
771591 +11.0% 856468 stress-ng.sockfd.ops_per_sec
2336146 -3.1% 2263883 stress-ng.time.involuntary_context_switches
1365039 ± 2% +15.8% 1580362 stress-ng.time.voluntary_context_switches
183309 ± 11% +26.1% 231069 ± 13% proc-vmstat.nr_mapped
1843540 +2.4% 1888288 proc-vmstat.numa_hit
1611479 +2.8% 1656095 proc-vmstat.numa_local
2952001 ± 3% +5.1% 3103307 proc-vmstat.pgalloc_normal
2282989 ± 4% +7.5% 2454018 ± 2% proc-vmstat.pgfree
0.42 ± 2% +6.2% 0.44 perf-stat.i.MPKI
1.487e+10 +1.8% 1.513e+10 perf-stat.i.branch-instructions
25452853 +9.2% 27794083 perf-stat.i.cache-misses
85628078 +8.2% 92619680 perf-stat.i.cache-references
63603 ± 2% +4.2% 66264 perf-stat.i.context-switches
10.03 -1.7% 9.86 perf-stat.i.cpi
26278 -9.1% 23887 perf-stat.i.cycles-between-cache-misses
6.35e+10 +2.2% 6.488e+10 perf-stat.i.instructions
0.10 +1.5% 0.10 perf-stat.i.ipc
0.06 ± 46% +140.8% 0.14 ± 40% perf-stat.i.major-faults
0.40 +7.8% 0.43 perf-stat.overall.MPKI
10.18 -2.0% 9.97 perf-stat.overall.cpi
25755 -9.1% 23420 perf-stat.overall.cycles-between-cache-misses
0.10 +2.0% 0.10 perf-stat.overall.ipc
1.423e+10 +1.3% 1.442e+10 perf-stat.ps.branch-instructions
49502972 +8.1e+05% 3.998e+11 ±223% perf-stat.ps.branch-misses
23994964 +9.7% 26326983 perf-stat.ps.cache-misses
82930036 +8.2% 89756764 perf-stat.ps.cache-references
61357 +3.3% 63381 perf-stat.ps.context-switches
6.072e+10 +1.8% 6.18e+10 perf-stat.ps.instructions
0.04 ± 46% +137.6% 0.11 ± 37% perf-stat.ps.major-faults
3.653e+12 +1.6% 3.712e+12 perf-stat.total.instructions
48.81 -0.2 48.59 perf-profile.calltrace.cycles-pp.unix_inflight.unix_scm_to_skb.unix_stream_sendmsg.____sys_sendmsg.___sys_sendmsg
48.45 -0.2 48.29 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg
48.57 -0.2 48.42 perf-profile.calltrace.cycles-pp.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg
48.53 -0.2 48.38 perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
49.02 -0.1 48.88 perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg
49.01 -0.1 48.87 perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg
49.02 -0.1 48.88 perf-profile.calltrace.cycles-pp.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64
49.04 -0.1 48.90 perf-profile.calltrace.cycles-pp.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
49.11 -0.1 48.97 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd
49.14 -0.1 49.00 perf-profile.calltrace.cycles-pp.recvmsg.stress_sockfd
49.08 -0.1 48.95 perf-profile.calltrace.cycles-pp.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd
49.07 -0.1 48.94 perf-profile.calltrace.cycles-pp.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg
49.11 -0.1 48.98 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd
0.56 ± 3% +0.1 0.65 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
0.56 ± 3% +0.1 0.65 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
0.58 ± 2% +0.1 0.67 ± 3% perf-profile.calltrace.cycles-pp.open64
0.55 ± 3% +0.1 0.64 ± 3% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
0.56 ± 3% +0.1 0.65 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
0.17 ±141% +0.4 0.57 ± 3% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
0.17 ±141% +0.4 0.58 ± 4% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
97.19 -0.4 96.83 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
97.49 -0.4 97.13 perf-profile.children.cycles-pp._raw_spin_lock
48.81 -0.2 48.59 perf-profile.children.cycles-pp.unix_inflight
48.57 -0.2 48.42 perf-profile.children.cycles-pp.unix_notinflight
49.02 -0.1 48.88 perf-profile.children.cycles-pp.unix_stream_read_generic
49.02 -0.1 48.88 perf-profile.children.cycles-pp.unix_stream_recvmsg
49.03 -0.1 48.89 perf-profile.children.cycles-pp.sock_recvmsg
49.04 -0.1 48.90 perf-profile.children.cycles-pp.____sys_recvmsg
49.09 -0.1 48.95 perf-profile.children.cycles-pp.__sys_recvmsg
49.07 -0.1 48.94 perf-profile.children.cycles-pp.___sys_recvmsg
49.15 -0.1 49.02 perf-profile.children.cycles-pp.recvmsg
0.09 +0.0 0.10 perf-profile.children.cycles-pp.__memcg_slab_free_hook
0.06 +0.0 0.07 perf-profile.children.cycles-pp.sock_alloc_send_pskb
0.12 ± 3% +0.0 0.14 ± 2% perf-profile.children.cycles-pp.alloc_empty_file
0.07 ± 8% +0.0 0.09 ± 11% perf-profile.children.cycles-pp.dput
0.07 ± 11% +0.0 0.09 ± 12% perf-profile.children.cycles-pp.lockref_put_return
0.12 ± 3% +0.0 0.15 ± 7% perf-profile.children.cycles-pp.__fput
0.22 ± 2% +0.0 0.24 ± 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.17 ± 2% +0.0 0.20 ± 6% perf-profile.children.cycles-pp.task_work_run
0.18 ± 3% +0.0 0.21 ± 6% perf-profile.children.cycles-pp.syscall
0.17 ± 6% +0.0 0.21 ± 7% perf-profile.children.cycles-pp.do_dentry_open
0.02 ±141% +0.0 0.06 ± 28% perf-profile.children.cycles-pp.generic_perform_write
0.09 ± 5% +0.0 0.14 ± 37% perf-profile.children.cycles-pp.cmd_record
0.09 ± 5% +0.0 0.14 ± 37% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.26 ± 5% +0.0 0.31 ± 6% perf-profile.children.cycles-pp.do_open
0.08 ± 8% +0.0 0.13 ± 34% perf-profile.children.cycles-pp.perf_mmap__push
0.09 ± 5% +0.1 0.14 ± 35% perf-profile.children.cycles-pp.main
0.09 ± 5% +0.1 0.14 ± 35% perf-profile.children.cycles-pp.run_builtin
0.50 ± 3% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.do_filp_open
0.49 ± 3% +0.1 0.57 ± 4% perf-profile.children.cycles-pp.path_openat
0.56 ± 3% +0.1 0.64 ± 3% perf-profile.children.cycles-pp.do_sys_openat2
0.56 ± 3% +0.1 0.65 ± 3% perf-profile.children.cycles-pp.__x64_sys_openat
0.59 ± 2% +0.1 0.69 ± 3% perf-profile.children.cycles-pp.open64
96.75 -0.3 96.40 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.07 ± 11% +0.0 0.09 ± 12% perf-profile.self.cycles-pp.lockref_put_return




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki