[linus:master] [sock_diag] f44e64990b: stress-ng.sockdiag.ops_per_sec 147.0% improvement

From: kernel test robot
Date: Mon Apr 01 2024 - 11:25:41 EST




Hello,

kernel test robot noticed a 147.0% improvement of stress-ng.sockdiag.ops_per_sec on:


commit: f44e64990beb41167bd7c313d90bcf7e290c3582 ("sock_diag: remove sock_diag_mutex")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
test: sockdiag
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240401/202404012326.d995728e-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockdiag/stress-ng/60s

commit:
86e8921df0 ("sock_diag: allow concurrent operation in sock_diag_rcv_msg()")
f44e64990b ("sock_diag: remove sock_diag_mutex")

86e8921df05c6e94 f44e64990beb41167bd7c313d90
---------------- ---------------------------
%stddev %change %stddev
\ | \
6805 ± 37% +630.7% 49725 ±137% numa-meminfo.node0.Active
6767 ± 37% +634.2% 49687 ±138% numa-meminfo.node0.Active(anon)
7690 +906.4% 77394 ± 48% vmstat.system.cs
420471 +6.2% 446552 vmstat.system.in
2.491e+08 +147.0% 6.154e+08 ± 40% stress-ng.sockdiag.ops
4152375 +147.0% 10257279 ± 40% stress-ng.sockdiag.ops_per_sec
86849 +350.0% 390836 ± 28% stress-ng.time.involuntary_context_switches
0.55 -0.3 0.30 ± 20% mpstat.cpu.all.irq%
0.10 ± 3% +0.0 0.15 ± 22% mpstat.cpu.all.soft%
0.46 +0.1 0.54 ± 2% mpstat.cpu.all.usr%
46.33 ± 12% -84.2% 7.33 ± 84% mpstat.max_utilization.seconds
2234616 ± 2% +136.2% 5279086 ± 37% numa-numastat.node0.local_node
2378097 +124.6% 5342166 ± 36% numa-numastat.node0.numa_hit
2678667 ± 2% +108.5% 5584120 ± 35% numa-numastat.node1.local_node
2768310 ± 3% +107.9% 5755443 ± 34% numa-numastat.node1.numa_hit
1211899 +13.3% 1372481 ± 2% meminfo.Inactive
1211695 +13.3% 1372284 ± 2% meminfo.Inactive(anon)
540274 +25.9% 680362 ± 7% meminfo.Mapped
449208 +8.6% 487827 ± 7% meminfo.SUnreclaim
862353 +23.0% 1060355 ± 3% meminfo.Shmem
161.00 ± 21% +579.5% 1094 ± 64% perf-c2c.DRAM.local
1480 ± 15% +661.4% 11271 ± 57% perf-c2c.DRAM.remote
1391 ± 14% +1182.4% 17843 ± 65% perf-c2c.HITM.local
585.00 ± 10% +1199.8% 7604 ± 59% perf-c2c.HITM.remote
1976 ± 13% +1187.6% 25447 ± 63% perf-c2c.HITM.total
965151 ± 3% -47.0% 511917 ± 6% sched_debug.cpu.avg_idle.avg
225203 ± 48% -84.8% 34261 ±130% sched_debug.cpu.avg_idle.min
1759 ± 6% +542.3% 11302 ± 45% sched_debug.cpu.nr_switches.avg
899.42 +738.6% 7542 ± 42% sched_debug.cpu.nr_switches.min
-30.17 +221.8% -97.08 sched_debug.cpu.nr_uninterruptible.min
1739 ± 37% +612.9% 12403 ±138% numa-vmstat.node0.nr_active_anon
1739 ± 37% +612.9% 12403 ±138% numa-vmstat.node0.nr_zone_active_anon
2377796 +124.5% 5337172 ± 36% numa-vmstat.node0.numa_hit
2234316 ± 2% +136.0% 5274091 ± 37% numa-vmstat.node0.numa_local
2767474 ± 3% +107.8% 5750481 ± 34% numa-vmstat.node1.numa_hit
2677832 ± 2% +108.3% 5579160 ± 35% numa-vmstat.node1.numa_local
980143 +5.0% 1028901 proc-vmstat.nr_file_pages
303091 +13.2% 342957 ± 2% proc-vmstat.nr_inactive_anon
40864 +1.6% 41510 proc-vmstat.nr_kernel_stack
135507 +25.7% 170340 ± 7% proc-vmstat.nr_mapped
215970 +22.6% 264729 ± 3% proc-vmstat.nr_shmem
41429 +7.8% 44664 ± 7% proc-vmstat.nr_slab_reclaimable
112306 +8.7% 122083 ± 7% proc-vmstat.nr_slab_unreclaimable
303091 +13.2% 342957 ± 2% proc-vmstat.nr_zone_inactive_anon
37590 ± 28% +51.2% 56819 ± 18% proc-vmstat.numa_hint_faults
5148855 +115.5% 11093970 ± 35% proc-vmstat.numa_hit
4915589 +120.9% 10859566 ± 36% proc-vmstat.numa_local
206083 ± 27% +58.4% 326447 ± 14% proc-vmstat.numa_pte_updates
32486467 +143.2% 79020889 ± 39% proc-vmstat.pgalloc_normal
759303 +16.3% 882814 ± 3% proc-vmstat.pgfault
32050628 +144.9% 78486695 ± 40% proc-vmstat.pgfree
0.13 ± 7% +536.1% 0.85 ± 21% perf-stat.i.MPKI
3.083e+10 -56.4% 1.344e+10 ± 2% perf-stat.i.branch-instructions
0.19 ± 3% +6870.7 6870 ±104% perf-stat.i.branch-miss-rate%
42989880 ± 2% +2e+06% 8.623e+11 ±101% perf-stat.i.branch-misses
16796111 ± 9% +189.6% 48642444 ± 25% perf-stat.i.cache-misses
68857289 ± 5% +196.5% 2.042e+08 ± 12% perf-stat.i.cache-references
7918 +929.7% 81533 ± 44% perf-stat.i.context-switches
3.94 +165.6% 10.46 perf-stat.i.cpi
39043 ± 10% -64.0% 14047 ± 19% perf-stat.i.cycles-between-cache-misses
1.541e+11 -62.6% 5.76e+10 ± 3% perf-stat.i.instructions
0.26 -61.2% 0.10 perf-stat.i.ipc
0.10 ± 92% +479.0% 0.56 ± 28% perf-stat.i.major-faults
12344 +20.7% 14898 ± 3% perf-stat.i.minor-faults
12345 +20.7% 14899 ± 3% perf-stat.i.page-faults
0.11 ± 9% +685.5% 0.84 ± 21% perf-stat.overall.MPKI
0.12 ± 2% +9674.7 9674 ±101% perf-stat.overall.branch-miss-rate%
4.00 +166.1% 10.63 perf-stat.overall.cpi
37756 ± 10% -65.1% 13184 ± 18% perf-stat.overall.cycles-between-cache-misses
0.25 -62.4% 0.09 perf-stat.overall.ipc
2.952e+10 -56.5% 1.284e+10 ± 2% perf-stat.ps.branch-instructions
35366132 ± 2% +3.6e+06% 1.256e+12 ±100% perf-stat.ps.branch-misses
15767609 ± 9% +194.8% 46490063 ± 26% perf-stat.ps.cache-misses
67236264 ± 4% +194.1% 1.977e+08 ± 12% perf-stat.ps.cache-references
7505 +941.8% 78193 ± 47% perf-stat.ps.context-switches
1.475e+11 -62.7% 5.497e+10 ± 3% perf-stat.ps.instructions
0.08 ± 88% +399.3% 0.41 ± 28% perf-stat.ps.major-faults
10427 ± 2% +19.6% 12474 ± 3% perf-stat.ps.minor-faults
10428 ± 2% +19.6% 12475 ± 3% perf-stat.ps.page-faults
8.86e+12 -62.6% 3.315e+12 ± 3% perf-stat.total.instructions
99.55 -99.6 0.00 perf-profile.calltrace.cycles-pp.sock_diag_rcv.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg
99.10 -99.1 0.00 perf-profile.calltrace.cycles-pp.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg.____sys_sendmsg
98.57 -98.6 0.00 perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg
99.57 -62.8 36.75 ±107% perf-profile.calltrace.cycles-pp.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg
99.58 -62.8 36.82 ±107% perf-profile.calltrace.cycles-pp.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64
99.58 -62.8 36.82 ±107% perf-profile.calltrace.cycles-pp.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.60 -62.8 36.84 ±107% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
99.60 -62.8 36.84 ±107% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendmsg
99.60 -62.8 36.84 ±107% perf-profile.calltrace.cycles-pp.sendmsg
99.59 -62.8 36.83 ±107% perf-profile.calltrace.cycles-pp.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
99.59 -62.8 36.83 ±107% perf-profile.calltrace.cycles-pp.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
0.00 +36.2 36.16 ±108% perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_diag_dump.netlink_dump.__netlink_dump_start.unix_diag_handler_dump
0.00 +36.7 36.65 ±107% perf-profile.calltrace.cycles-pp.unix_diag_dump.netlink_dump.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg
0.00 +36.7 36.69 ±107% perf-profile.calltrace.cycles-pp.netlink_dump.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb
0.00 +36.7 36.70 ±107% perf-profile.calltrace.cycles-pp.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast
0.00 +36.7 36.70 ±107% perf-profile.calltrace.cycles-pp.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast.netlink_sendmsg
0.00 +36.7 36.72 ±107% perf-profile.calltrace.cycles-pp.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast.netlink_sendmsg.____sys_sendmsg
0.00 +36.7 36.72 ±107% perf-profile.calltrace.cycles-pp.netlink_rcv_skb.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg
99.55 -99.6 0.00 perf-profile.children.cycles-pp.sock_diag_rcv
99.10 -99.1 0.00 perf-profile.children.cycles-pp.__mutex_lock
98.60 -98.6 0.00 perf-profile.children.cycles-pp.osq_lock
99.57 -62.8 36.75 ±107% perf-profile.children.cycles-pp.netlink_unicast
99.58 -62.8 36.82 ±107% perf-profile.children.cycles-pp.netlink_sendmsg
99.58 -62.8 36.82 ±107% perf-profile.children.cycles-pp.____sys_sendmsg
99.60 -62.8 36.85 ±107% perf-profile.children.cycles-pp.sendmsg
99.59 -62.8 36.83 ±107% perf-profile.children.cycles-pp.___sys_sendmsg
99.59 -62.8 36.83 ±107% perf-profile.children.cycles-pp.__sys_sendmsg
0.51 ± 2% -0.3 0.22 ± 27% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.50 ± 2% -0.3 0.21 ± 27% perf-profile.children.cycles-pp.hrtimer_interrupt
0.62 ± 2% -0.3 0.35 ± 18% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.64 ± 2% -0.3 0.37 ± 15% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.38 ± 3% -0.2 0.15 ± 22% perf-profile.children.cycles-pp.tick_nohz_highres_handler
0.39 ± 3% -0.2 0.17 ± 16% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.36 ± 4% -0.2 0.14 ± 21% perf-profile.children.cycles-pp.tick_sched_handle
0.36 ± 3% -0.2 0.14 ± 21% perf-profile.children.cycles-pp.update_process_times
0.31 ± 4% -0.2 0.12 ± 19% perf-profile.children.cycles-pp.scheduler_tick
0.24 ± 3% -0.2 0.08 ± 31% perf-profile.children.cycles-pp.task_tick_fair
0.17 ± 6% -0.0 0.12 ± 14% perf-profile.children.cycles-pp.main
0.17 ± 6% -0.0 0.12 ± 14% perf-profile.children.cycles-pp.run_builtin
0.17 ± 6% -0.0 0.13 ± 15% perf-profile.children.cycles-pp.cmd_record
0.17 ± 5% -0.0 0.12 ± 14% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.16 ± 5% -0.0 0.12 ± 12% perf-profile.children.cycles-pp.perf_mmap__push
0.09 ± 5% -0.0 0.08 ± 10% perf-profile.children.cycles-pp.writen
0.09 ± 4% -0.0 0.08 ± 10% perf-profile.children.cycles-pp.write
0.08 ± 5% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.ksys_write
0.07 ± 5% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.shmem_file_write_iter
0.10 +0.0 0.13 ± 5% perf-profile.children.cycles-pp.irq_exit_rcu
0.09 ± 4% +0.0 0.13 ± 26% perf-profile.children.cycles-pp.rcu_core
0.10 ± 3% +0.0 0.14 ± 17% perf-profile.children.cycles-pp.__do_softirq
0.05 +0.1 0.12 ± 48% perf-profile.children.cycles-pp.__sys_recvmsg
0.06 ± 8% +0.1 0.14 ± 47% perf-profile.children.cycles-pp.recvmsg
0.00 +0.1 0.09 ± 48% perf-profile.children.cycles-pp.netlink_recvmsg
0.00 +0.1 0.09 ± 48% perf-profile.children.cycles-pp.sock_recvmsg
0.02 ± 99% +0.1 0.12 ± 47% perf-profile.children.cycles-pp.___sys_recvmsg
0.00 +0.1 0.10 ± 49% perf-profile.children.cycles-pp.____sys_recvmsg
0.07 +0.1 0.18 ± 49% perf-profile.children.cycles-pp.sk_diag_fill
0.00 +0.1 0.12 ± 62% perf-profile.children.cycles-pp._raw_read_lock
0.00 +0.2 0.18 ± 61% perf-profile.children.cycles-pp.sock_i_ino
0.00 +0.9 0.85 ± 60% perf-profile.children.cycles-pp.__wake_up
0.00 +1.1 1.07 ± 60% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.00 +23.6 23.58 ± 63% perf-profile.children.cycles-pp.netlink_create
0.00 +23.6 23.62 ± 63% perf-profile.children.cycles-pp.__sock_create
0.00 +23.7 23.65 ± 62% perf-profile.children.cycles-pp.__sys_socket
0.00 +23.7 23.65 ± 62% perf-profile.children.cycles-pp.__x64_sys_socket
0.00 +23.7 23.66 ± 62% perf-profile.children.cycles-pp.__socket
0.29 +35.9 36.21 ±108% perf-profile.children.cycles-pp._raw_spin_lock
0.42 +36.3 36.67 ±107% perf-profile.children.cycles-pp.unix_diag_dump
0.44 +36.3 36.70 ±107% perf-profile.children.cycles-pp.__netlink_dump_start
0.44 +36.3 36.70 ±107% perf-profile.children.cycles-pp.unix_diag_handler_dump
0.44 +36.3 36.72 ±107% perf-profile.children.cycles-pp.sock_diag_rcv_msg
0.44 +36.3 36.72 ±107% perf-profile.children.cycles-pp.netlink_rcv_skb
0.44 +36.3 36.73 ±107% perf-profile.children.cycles-pp.netlink_dump
0.00 +38.9 38.94 ± 63% perf-profile.children.cycles-pp.__sock_release
0.00 +38.9 38.94 ± 63% perf-profile.children.cycles-pp.netlink_release
0.00 +38.9 38.94 ± 63% perf-profile.children.cycles-pp.sock_close
0.00 +39.0 38.98 ± 63% perf-profile.children.cycles-pp.__fput
0.00 +39.0 38.99 ± 62% perf-profile.children.cycles-pp.__x64_sys_close
0.00 +39.0 39.01 ± 62% perf-profile.children.cycles-pp.__close
0.00 +94.8 94.81 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
98.02 -98.0 0.00 perf-profile.self.cycles-pp.osq_lock
0.06 ± 6% +0.1 0.15 ± 54% perf-profile.self.cycles-pp.unix_diag_dump
0.00 +0.1 0.11 ± 60% perf-profile.self.cycles-pp._raw_read_lock
0.00 +0.3 0.25 ± 51% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.26 ± 2% +2.6 2.82 ± 72% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +94.7 94.65 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki