[viro-vfs:work.d_revalidate] [dcache] 077ab1260a: will-it-scale.per_process_ops 1.9% improvement

From: kernel test robot
Date: Thu Jan 09 2025 - 22:15:15 EST




Hello,

kernel test robot noticed a 1.9% improvement of will-it-scale.per_process_ops on:


commit: 077ab1260a52068a62a5fb08fa2c5f1d0dcf2738 ("dcache: back inline names with a struct-wrapped array of unsigned long")
https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git work.d_revalidate

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

nr_task: 100%
mode: process
test: poll2
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250110/202501101058.cd8beeba-lkp@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/poll2/will-it-scale

commit:
cf0cc84299 ("make sure that DNAME_INLINE_LEN is a multiple of word size")
077ab1260a ("dcache: back inline names with a struct-wrapped array of unsigned long")

cf0cc842995ca3da 077ab1260a52068a62a5fb08fa2
---------------- ---------------------------
%stddev %change %stddev
\ | \
294.00 ± 10% +15.2% 338.67 ± 5% perf-c2c.DRAM.remote
243.33 ± 9% +13.7% 276.67 ± 6% perf-c2c.HITM.remote
21502 ± 5% +413.7% 110453 ±117% sched_debug.cfs_rq:/.load.max
2543 ± 6% +336.8% 11109 ±111% sched_debug.cfs_rq:/.load.stddev
274.83 ± 19% +28.8% 353.86 ± 6% sched_debug.cfs_rq:/.util_est.min
24387540 +1.9% 24841387 will-it-scale.104.processes
234495 +1.9% 238859 will-it-scale.per_process_ops
24387540 +1.9% 24841387 will-it-scale.workload
0.85 ± 11% -20.5% 0.68 ± 10% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64
1.71 ± 11% -20.6% 1.36 ± 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64
38.41 ±104% -78.0% 8.46 perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
3676 ± 13% -34.3% 2415 ± 21% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.85 ± 11% -20.5% 0.68 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64
3676 ± 13% -34.3% 2415 ± 21% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
4.591e+10 +1.9% 4.676e+10 perf-stat.i.branch-instructions
1.367e+08 +1.9% 1.392e+08 perf-stat.i.branch-misses
1.08 -1.9% 1.06 perf-stat.i.cpi
2.584e+11 +1.9% 2.632e+11 perf-stat.i.instructions
0.92 +1.9% 0.94 perf-stat.i.ipc
1.08 -1.8% 1.06 perf-stat.overall.cpi
0.93 +1.9% 0.94 perf-stat.overall.ipc
4.575e+10 +1.9% 4.66e+10 perf-stat.ps.branch-instructions
1.362e+08 +1.9% 1.388e+08 perf-stat.ps.branch-misses
2.575e+11 +1.9% 2.623e+11 perf-stat.ps.instructions
7.785e+13 +1.9% 7.93e+13 perf-stat.total.instructions
59.17 -1.5 57.63 perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
71.18 -1.4 69.76 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
70.73 -1.4 69.32 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
72.76 -1.3 71.48 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
76.80 -1.1 75.70 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll
43.66 -1.1 42.61 perf-profile.calltrace.cycles-pp.fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
94.61 -0.2 94.40 perf-profile.calltrace.cycles-pp.__poll
0.92 +0.0 0.94 perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.66 +0.1 2.73 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__poll
4.90 +0.2 5.10 perf-profile.calltrace.cycles-pp.testcase
5.81 +0.2 6.04 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__poll
1.98 ± 3% +0.3 2.26 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__poll
7.25 +0.3 7.56 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__poll
59.29 -1.6 57.72 perf-profile.children.cycles-pp.do_poll
71.24 -1.4 69.83 perf-profile.children.cycles-pp.__x64_sys_poll
70.82 -1.4 69.41 perf-profile.children.cycles-pp.do_sys_poll
72.83 -1.3 71.55 perf-profile.children.cycles-pp.do_syscall_64
76.94 -1.1 75.84 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
43.57 -1.0 42.53 perf-profile.children.cycles-pp.fdget
95.18 -0.2 94.97 perf-profile.children.cycles-pp.__poll
1.16 ± 2% +0.2 1.32 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
3.50 +0.2 3.69 perf-profile.children.cycles-pp.entry_SYSCALL_64
4.91 +0.2 5.12 perf-profile.children.cycles-pp.testcase
6.22 +0.2 6.46 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
7.31 +0.3 7.62 perf-profile.children.cycles-pp.syscall_return_via_sysret
42.16 -1.0 41.16 perf-profile.self.cycles-pp.fdget
16.86 -0.6 16.30 perf-profile.self.cycles-pp.do_poll
0.90 +0.0 0.93 perf-profile.self.cycles-pp.kfree
0.32 ± 2% +0.0 0.36 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
1.20 ± 3% +0.1 1.32 ± 2% perf-profile.self.cycles-pp.__poll
0.76 ± 2% +0.1 0.89 ± 4% perf-profile.self.cycles-pp.do_syscall_64
4.88 +0.1 5.00 perf-profile.self.cycles-pp.do_sys_poll
3.10 +0.2 3.28 perf-profile.self.cycles-pp.entry_SYSCALL_64
4.18 +0.2 4.37 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
4.73 +0.2 4.94 perf-profile.self.cycles-pp.testcase
6.16 +0.2 6.40 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
7.30 +0.3 7.62 perf-profile.self.cycles-pp.syscall_return_via_sysret




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki