[linus:master] [remap_range] dfad37051a: stress-ng.file-ioctl.ops_per_sec -11.2% regression

From: kenel test robot
Date: Wed Jan 31 2024 - 09:14:57 EST




Hello,

kernel test robot noticed a -11.2% regression of stress-ng.file-ioctl.ops_per_sec on:


commit: dfad37051ade6ac0d404ef4913f3bd01954ee51c ("remap_range: move permission hooks out of do_clone_file_range()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: btrfs
test: file-ioctl
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240131/202401312229.eddeb9a6-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/1HDD/btrfs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/file-ioctl/stress-ng/60s

commit:
d53471ba6f ("splice: remove permission hook from iter_file_splice_write()")
dfad37051a ("remap_range: move permission hooks out of do_clone_file_range()")

d53471ba6f7ae97a dfad37051ade6ac0d404ef4913f
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.57 -0.3 2.27 mpstat.cpu.all.usr%
7.40 +3.4% 7.65 iostat.cpu.system
2.50 -11.5% 2.22 iostat.cpu.user
95739218 -11.2% 84990543 ± 2% stress-ng.file-ioctl.ops
1595650 -11.2% 1416506 ± 2% stress-ng.file-ioctl.ops_per_sec
267.41 +4.2% 278.66 stress-ng.time.system_time
90.19 -12.5% 78.96 stress-ng.time.user_time
0.12 ± 9% +37.6% 0.16 ± 3% perf-stat.i.MPKI
5.619e+09 -4.9% 5.346e+09 perf-stat.i.branch-instructions
25.26 ± 12% +5.4 30.67 ± 2% perf-stat.i.cache-miss-rate%
3226271 ± 8% +32.3% 4268159 ± 2% perf-stat.i.cache-misses
13880671 ± 2% +7.6% 14934433 perf-stat.i.cache-references
0.83 +3.9% 0.86 perf-stat.i.cpi
7405 ± 8% -26.1% 5473 ± 2% perf-stat.i.cycles-between-cache-misses
5.186e+09 -6.0% 4.873e+09 perf-stat.i.dTLB-stores
2.807e+10 -3.9% 2.696e+10 perf-stat.i.instructions
1.21 -3.7% 1.17 perf-stat.i.ipc
257.16 +12.9% 290.46 perf-stat.i.metric.K/sec
290.80 -4.2% 278.45 perf-stat.i.metric.M/sec
1580051 ± 11% +38.0% 2180479 ± 5% perf-stat.i.node-load-misses
228848 ± 22% +116.2% 494834 ± 27% perf-stat.i.node-loads
0.11 ± 9% +37.7% 0.16 ± 3% perf-stat.overall.MPKI
23.29 ± 11% +5.3 28.58 ± 2% perf-stat.overall.cache-miss-rate%
0.82 +3.9% 0.86 perf-stat.overall.cpi
7231 ± 8% -25.1% 5416 ± 2% perf-stat.overall.cycles-between-cache-misses
1.21 -3.7% 1.17 perf-stat.overall.ipc
5.524e+09 -4.8% 5.257e+09 perf-stat.ps.branch-instructions
3170718 ± 8% +32.4% 4196610 ± 2% perf-stat.ps.cache-misses
13646445 ± 2% +7.6% 14686495 ± 2% perf-stat.ps.cache-references
5.099e+09 -6.0% 4.792e+09 perf-stat.ps.dTLB-stores
2.759e+10 -3.9% 2.651e+10 perf-stat.ps.instructions
1553350 ± 11% +38.1% 2144498 ± 5% perf-stat.ps.node-load-misses
224907 ± 22% +116.2% 486304 ± 27% perf-stat.ps.node-loads
1.668e+12 -3.4% 1.611e+12 ± 2% perf-stat.total.instructions
5.57 ± 3% -0.7 4.85 ± 2% perf-profile.calltrace.cycles-pp.__fget_light.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
0.89 ± 23% -0.4 0.45 ± 44% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
2.30 ± 2% -0.3 2.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
1.69 ± 3% -0.3 1.39 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
1.99 ± 2% -0.3 1.72 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.16 ± 3% -0.2 1.00 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.60 ± 4% -0.2 0.44 ± 45% perf-profile.calltrace.cycles-pp.__fget_light.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.5 1.52 ± 2% perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_clone_file_range.ioctl_file_clone.do_vfs_ioctl.__x64_sys_ioctl
0.00 +6.9 6.94 ± 6% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_clone_file_range.ioctl_file_clone.do_vfs_ioctl
0.00 +7.4 7.41 ± 6% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_clone_file_range.ioctl_file_clone.do_vfs_ioctl.__x64_sys_ioctl
21.11 +7.4 28.53 perf-profile.calltrace.cycles-pp.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
3.18 ± 2% +8.7 11.87 ± 3% perf-profile.calltrace.cycles-pp.ioctl_file_clone.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.46 ± 9% +8.9 10.36 ± 4% perf-profile.calltrace.cycles-pp.vfs_clone_file_range.ioctl_file_clone.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
10.70 -1.3 9.39 ± 3% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
11.31 -1.1 10.24 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
7.87 ± 3% -1.0 6.90 perf-profile.children.cycles-pp.__fget_light
5.13 -0.7 4.46 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.89 -0.4 0.46 ± 5% perf-profile.children.cycles-pp.do_clone_file_range
3.45 ± 2% -0.4 3.10 perf-profile.children.cycles-pp.llseek
1.80 ± 4% -0.3 1.49 ± 3% perf-profile.children.cycles-pp.stress_file_ioctl
1.83 -0.2 1.63 ± 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.53 ± 3% -0.2 1.34 ± 4% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
2.32 ± 3% -0.2 2.13 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.58 ± 2% -0.2 1.40 perf-profile.children.cycles-pp.memdup_user
1.81 -0.2 1.62 perf-profile.children.cycles-pp.__get_user_4
1.26 ± 3% -0.2 1.08 ± 3% perf-profile.children.cycles-pp.__x64_sys_fcntl
1.32 ± 2% -0.2 1.14 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
2.06 ± 2% -0.2 1.90 ± 3% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
1.12 ± 3% -0.1 0.99 ± 2% perf-profile.children.cycles-pp.security_file_ioctl
0.84 ± 3% -0.1 0.73 ± 3% perf-profile.children.cycles-pp.ksys_lseek
0.29 ± 4% -0.1 0.18 ± 4% perf-profile.children.cycles-pp.generic_file_rw_checks
0.76 ± 3% -0.1 0.68 perf-profile.children.cycles-pp.amd_clear_divider
0.84 ± 3% -0.1 0.75 ± 3% perf-profile.children.cycles-pp.__put_user_4
0.86 ± 4% -0.1 0.78 ± 3% perf-profile.children.cycles-pp._raw_spin_lock
0.53 ± 3% -0.1 0.46 ± 4% perf-profile.children.cycles-pp.__fdget_pos
0.19 ± 11% -0.1 0.12 ± 10% perf-profile.children.cycles-pp.stress_mwc8
0.54 ± 5% -0.1 0.48 ± 6% perf-profile.children.cycles-pp.__check_object_size
0.73 ± 2% -0.1 0.67 ± 5% perf-profile.children.cycles-pp.__fdget
0.49 ± 2% -0.1 0.43 ± 3% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
0.51 ± 4% -0.1 0.45 ± 5% perf-profile.children.cycles-pp.ioctl@plt
0.58 ± 3% -0.0 0.54 ± 4% perf-profile.children.cycles-pp.__get_user_2
0.38 ± 3% -0.0 0.33 ± 4% perf-profile.children.cycles-pp.__kmem_cache_alloc_node
0.44 ± 3% -0.0 0.40 ± 3% perf-profile.children.cycles-pp.__libc_fcntl64
0.24 ± 6% -0.0 0.20 ± 7% perf-profile.children.cycles-pp.do_fcntl
0.48 ± 3% -0.0 0.44 ± 2% perf-profile.children.cycles-pp.set_close_on_exec
0.16 ± 8% -0.0 0.14 ± 8% perf-profile.children.cycles-pp.__check_heap_object
0.00 +0.2 0.25 ± 4% perf-profile.children.cycles-pp.fsnotify_perm
0.57 +0.6 1.15 ± 3% perf-profile.children.cycles-pp.aa_file_perm
85.52 +1.4 86.91 perf-profile.children.cycles-pp.ioctl
0.00 +1.6 1.55 perf-profile.children.cycles-pp.__fsnotify_parent
62.60 +4.0 66.55 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
59.77 +4.3 64.05 perf-profile.children.cycles-pp.do_syscall_64
47.98 +5.7 53.66 perf-profile.children.cycles-pp.__x64_sys_ioctl
21.64 +7.3 28.98 perf-profile.children.cycles-pp.do_vfs_ioctl
8.29 ± 4% +7.4 15.74 ± 6% perf-profile.children.cycles-pp.apparmor_file_permission
8.78 ± 4% +7.9 16.64 ± 5% perf-profile.children.cycles-pp.security_file_permission
3.30 ± 2% +8.7 11.96 ± 3% perf-profile.children.cycles-pp.ioctl_file_clone
1.68 +8.9 10.55 ± 3% perf-profile.children.cycles-pp.vfs_clone_file_range
10.33 -1.3 9.02 ± 3% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
11.15 -1.2 9.92 ± 2% perf-profile.self.cycles-pp.ioctl
7.55 ± 3% -0.9 6.61 perf-profile.self.cycles-pp.__fget_light
3.16 ± 4% -0.5 2.69 ± 2% perf-profile.self.cycles-pp.do_vfs_ioctl
2.95 ± 2% -0.4 2.55 ± 2% perf-profile.self.cycles-pp.__x64_sys_ioctl
3.32 -0.4 2.93 ± 2% perf-profile.self.cycles-pp.do_syscall_64
3.08 ± 2% -0.4 2.72 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
3.13 -0.4 2.78 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
2.39 ± 2% -0.3 2.10 ± 2% perf-profile.self.cycles-pp.ioctl_preallocate
0.57 ± 2% -0.3 0.31 ± 9% perf-profile.self.cycles-pp.do_clone_file_range
2.02 ± 2% -0.3 1.77 ± 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.54 ± 4% -0.2 1.29 ± 3% perf-profile.self.cycles-pp.stress_file_ioctl
1.83 -0.2 1.62 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
2.32 ± 3% -0.2 2.13 perf-profile.self.cycles-pp.syscall_return_via_sysret
1.77 -0.2 1.58 perf-profile.self.cycles-pp.__get_user_4
1.28 ± 2% -0.2 1.11 ± 4% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
1.76 ± 2% -0.1 1.62 ± 3% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.25 ± 6% -0.1 0.12 ± 8% perf-profile.self.cycles-pp.generic_file_rw_checks
0.48 ± 2% -0.1 0.38 ± 4% perf-profile.self.cycles-pp.ioctl_file_clone
0.79 ± 3% -0.1 0.70 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.81 ± 3% -0.1 0.73 ± 4% perf-profile.self.cycles-pp.__put_user_4
0.81 ± 5% -0.1 0.73 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
0.52 ± 4% -0.1 0.44 ± 3% perf-profile.self.cycles-pp.amd_clear_divider
0.17 ± 11% -0.1 0.12 ± 10% perf-profile.self.cycles-pp.stress_mwc8
0.57 ± 3% -0.0 0.52 ± 4% perf-profile.self.cycles-pp.__get_user_2
0.42 ± 4% -0.0 0.38 ± 3% perf-profile.self.cycles-pp.__libc_fcntl64
0.30 ± 3% -0.0 0.26 ± 5% perf-profile.self.cycles-pp.__x64_sys_fcntl
0.22 ± 5% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.do_fcntl
0.28 ± 3% -0.0 0.24 ± 2% perf-profile.self.cycles-pp.__kmem_cache_alloc_node
0.00 +0.2 0.22 ± 4% perf-profile.self.cycles-pp.fsnotify_perm
0.49 ± 3% +0.4 0.92 ± 2% perf-profile.self.cycles-pp.security_file_permission
0.46 ± 2% +0.5 0.96 ± 2% perf-profile.self.cycles-pp.aa_file_perm
0.00 +1.5 1.52 ± 2% perf-profile.self.cycles-pp.__fsnotify_parent
7.75 ± 4% +6.8 14.58 ± 7% perf-profile.self.cycles-pp.apparmor_file_permission




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki