Re: [PATCH] [v2] filemap: Move prefaulting out of hot write path
From: kernel test robot
Date: Mon Mar 10 2025 - 04:46:04 EST
Hello,
kernel test robot noticed a 3.6% improvement of will-it-scale.per_thread_ops on:
commit: 391ab5826c820c58d180534a7a727ff5668d4d61 ("[PATCH] [v2] filemap: Move prefaulting out of hot write path")
url: https://github.com/intel-lab-lkp/linux/commits/Dave-Hansen/filemap-Move-prefaulting-out-of-hot-write-path/20250301-043921
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20250228203722.CAEB63AC@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
patch subject: [PATCH] [v2] filemap: Move prefaulting out of hot write path
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:
nr_task: 100%
mode: thread
test: pwrite1
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250310/202503101621.e0858506-lkp@xxxxxxxxx
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pwrite1/will-it-scale
commit:
3dec9c0e67 ("foo")
391ab5826c ("filemap: Move prefaulting out of hot write path")
3dec9c0e67aaf496 391ab5826c820c58d180534a7a7
---------------- ---------------------------
%stddev %change %stddev
\ | \
182266 ± 3% +9.4% 199333 ± 4% meminfo.DirectMap4k
765.67 ± 8% -22.1% 596.83 ± 9% perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
17510 ± 6% +19.3% 20889 ± 8% sched_debug.cpu.nr_switches.max
3219 ± 5% +11.2% 3578 ± 2% sched_debug.cpu.nr_switches.stddev
54561715 +3.6% 56543708 will-it-scale.104.threads
524631 +3.6% 543689 will-it-scale.per_thread_ops
54561715 +3.6% 56543708 will-it-scale.workload
1.752e+10 -1.2% 1.731e+10 perf-stat.i.branch-instructions
1.59 +0.0 1.63 perf-stat.i.branch-miss-rate%
3.25 +1.8% 3.31 perf-stat.i.cpi
8.828e+10 -1.5% 8.699e+10 perf-stat.i.instructions
0.31 -1.8% 0.30 perf-stat.i.ipc
1.58 +0.0 1.62 perf-stat.overall.branch-miss-rate%
3.25 +1.8% 3.31 perf-stat.overall.cpi
0.31 -1.7% 0.30 perf-stat.overall.ipc
487316 -4.8% 464012 perf-stat.overall.path-length
1.746e+10 -1.2% 1.725e+10 perf-stat.ps.branch-instructions
8.798e+10 -1.5% 8.67e+10 perf-stat.ps.instructions
2.659e+13 -1.3% 2.624e+13 perf-stat.total.instructions
34.45 -5.9 28.57 perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
48.17 -5.3 42.87 ± 2% perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
42.56 -5.3 37.29 ± 2% perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe
51.38 -4.9 46.46 perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
54.26 -4.5 49.76 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
62.52 -3.9 58.65 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
13.30 ± 2% -2.3 10.96 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
10.17 ± 2% -2.0 8.18 perf-profile.calltrace.cycles-pp.rep_movs_alternative.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write
0.59 ± 4% -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
0.88 ± 3% -0.1 0.74 ± 4% perf-profile.calltrace.cycles-pp.folio_mark_accessed.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
0.67 ± 2% -0.1 0.56 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
99.51 +0.1 99.56 perf-profile.calltrace.cycles-pp.__libc_pwrite
0.64 ± 5% +0.1 0.71 ± 2% perf-profile.calltrace.cycles-pp.fput.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
0.93 ± 2% +0.1 1.02 perf-profile.calltrace.cycles-pp.folio_unlock.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
0.72 +0.1 0.82 ± 2% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64_mg.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter
1.20 +0.2 1.38 ± 2% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
0.84 ± 2% +0.2 1.04 ± 2% perf-profile.calltrace.cycles-pp.noop_dirty_folio.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
1.48 +0.2 1.72 ± 2% perf-profile.calltrace.cycles-pp.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write
1.00 ± 2% +0.3 1.26 ± 3% perf-profile.calltrace.cycles-pp.folio_mark_dirty.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
1.99 ± 3% +0.3 2.27 ± 4% perf-profile.calltrace.cycles-pp.fdget.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
6.69 +0.3 7.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__libc_pwrite
2.24 +0.4 2.60 ± 3% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
2.78 ± 2% +0.4 3.21 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
4.34 +0.6 4.92 perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
12.01 +0.9 12.92 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__libc_pwrite
2.89 ± 6% +1.4 4.26 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_pwrite
15.09 +1.5 16.61 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pwrite
34.58 -5.9 28.70 perf-profile.children.cycles-pp.generic_perform_write
48.24 -5.3 42.94 ± 2% perf-profile.children.cycles-pp.vfs_write
42.95 -5.3 37.66 ± 2% perf-profile.children.cycles-pp.shmem_file_write_iter
51.54 -4.9 46.63 perf-profile.children.cycles-pp.__x64_sys_pwrite64
54.37 -4.5 49.86 perf-profile.children.cycles-pp.do_syscall_64
62.76 -3.9 58.90 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
10.62 ± 2% -2.3 8.30 perf-profile.children.cycles-pp.rep_movs_alternative
13.47 ± 2% -2.3 11.16 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
0.90 ± 3% -0.1 0.76 ± 4% perf-profile.children.cycles-pp.folio_mark_accessed
0.69 ± 2% -0.1 0.60 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
0.29 -0.0 0.26 ± 2% perf-profile.children.cycles-pp.testcase
0.31 ± 3% -0.0 0.29 ± 2% perf-profile.children.cycles-pp.update_process_times
0.50 -0.0 0.48 perf-profile.children.cycles-pp.rcu_all_qs
99.67 +0.0 99.70 perf-profile.children.cycles-pp.__libc_pwrite
0.64 ± 5% +0.1 0.71 ± 2% perf-profile.children.cycles-pp.fput
0.43 ± 3% +0.1 0.50 ± 2% perf-profile.children.cycles-pp.folio_mapping
0.94 ± 2% +0.1 1.02 perf-profile.children.cycles-pp.folio_unlock
0.74 ± 2% +0.1 0.85 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64_mg
1.23 +0.2 1.41 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.89 +0.2 1.10 ± 2% perf-profile.children.cycles-pp.noop_dirty_folio
1.54 +0.2 1.79 ± 2% perf-profile.children.cycles-pp.current_time
1.99 ± 3% +0.3 2.27 ± 4% perf-profile.children.cycles-pp.fdget
1.08 ± 3% +0.3 1.36 ± 3% perf-profile.children.cycles-pp.folio_mark_dirty
2.30 +0.4 2.66 ± 3% perf-profile.children.cycles-pp.inode_needs_update_time
2.86 ± 2% +0.4 3.30 ± 2% perf-profile.children.cycles-pp.file_update_time
4.58 +0.6 5.17 perf-profile.children.cycles-pp.shmem_write_end
1.73 ± 5% +0.7 2.43 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
12.88 +1.0 13.84 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
6.99 +1.0 7.95 perf-profile.children.cycles-pp.entry_SYSCALL_64
15.22 +1.5 16.75 perf-profile.children.cycles-pp.syscall_return_via_sysret
10.43 ± 2% -2.3 8.08 perf-profile.self.cycles-pp.rep_movs_alternative
3.26 -0.4 2.86 perf-profile.self.cycles-pp.generic_perform_write
2.02 -0.2 1.78 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.87 ± 3% -0.1 0.74 ± 4% perf-profile.self.cycles-pp.folio_mark_accessed
0.53 ± 2% -0.1 0.43 ± 2% perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.25 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.testcase
0.54 +0.0 0.59 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.79 ± 4% +0.0 0.84 ± 2% perf-profile.self.cycles-pp.__x64_sys_pwrite64
0.51 ± 2% +0.1 0.58 ± 2% perf-profile.self.cycles-pp.fput
0.38 ± 3% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.folio_mapping
0.74 ± 2% +0.1 0.82 perf-profile.self.cycles-pp.folio_unlock
0.73 ± 4% +0.1 0.82 ± 4% perf-profile.self.cycles-pp.inode_needs_update_time
0.54 ± 6% +0.1 0.64 ± 3% perf-profile.self.cycles-pp.file_update_time
0.72 ± 2% +0.1 0.82 ± 2% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64_mg
0.79 ± 2% +0.1 0.94 ± 3% perf-profile.self.cycles-pp.current_time
0.98 ± 2% +0.2 1.14 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.85 +0.2 1.04 ± 2% perf-profile.self.cycles-pp.noop_dirty_folio
0.65 ± 3% +0.2 0.85 ± 4% perf-profile.self.cycles-pp.folio_mark_dirty
1.11 ± 6% +0.2 1.32 ± 2% perf-profile.self.cycles-pp.do_syscall_64
1.98 ± 3% +0.3 2.26 ± 4% perf-profile.self.cycles-pp.fdget
2.14 ± 2% +0.4 2.55 ± 3% perf-profile.self.cycles-pp.__libc_pwrite
1.63 +0.6 2.23 ± 3% perf-profile.self.cycles-pp.shmem_write_begin
8.54 +0.7 9.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
6.09 +0.9 7.03 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
12.75 +1.0 13.70 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
15.20 +1.5 16.72 perf-profile.self.cycles-pp.syscall_return_via_sysret
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki