[linus:master] [filemap] 9aac777aaf: phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s -14.0% regression

From: kernel test robot
Date: Wed Jul 24 2024 - 10:41:35 EST




Hello,

kernel test robot noticed a -14.0% regression of phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s on:


commit: 9aac777aaf9459786bc8463e6cbfc7e7e1abd1f9 ("filemap: Convert generic_perform_write() to support large folios")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: phoronix-test-suite
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
parameters:

test: iozone-1.9.6
option_a: 1MB
option_b: 512MB
option_c: Write Performance
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202407242232.9109947e-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240724/202407242232.9109947e-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/1MB/512MB/Write Performance/debian-12-x86_64-phoronix/lkp-csl-2sp7/iozone-1.9.6/phoronix-test-suite

commit:
146a99aefe ("xprtrdma: removed asm-generic headers from verbs.c")
9aac777aaf ("filemap: Convert generic_perform_write() to support large folios")

146a99aefe4a45f6 9aac777aaf9459786bc8463e6cb
---------------- ---------------------------
%stddev %change %stddev
\ | \
3043 -14.0% 2618 phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s
6003 ± 6% +21.0% 7262 ± 21% proc-vmstat.nr_active_anon
6003 ± 6% +21.0% 7262 ± 21% proc-vmstat.nr_zone_active_anon
0.62 ± 43% +90.5% 1.19 ± 43% sched_debug.cfs_rq:/system.slice/containerd.service.load_avg.avg
0.62 ± 43% +94.9% 1.21 ± 40% sched_debug.cfs_rq:/system.slice/containerd.service.runnable_avg.avg
0.59 ± 36% +99.4% 1.19 ± 41% sched_debug.cfs_rq:/system.slice/containerd.service.se->avg.runnable_avg.avg
0.59 ± 36% +99.4% 1.19 ± 41% sched_debug.cfs_rq:/system.slice/containerd.service.se->avg.util_avg.avg
0.62 ± 43% +85.1% 1.15 ± 39% sched_debug.cfs_rq:/system.slice/containerd.service.tg_load_avg_contrib.avg
0.62 ± 43% +94.9% 1.21 ± 40% sched_debug.cfs_rq:/system.slice/containerd.service.util_avg.avg
60.61 -2.1 58.48 perf-stat.i.iTLB-load-miss-rate%
910966 -3.4% 879846 perf-stat.i.iTLB-load-misses
5100 ± 2% +4.8% 5346 ± 2% perf-stat.i.instructions-per-iTLB-miss
57.76 ± 2% +3.0 60.79 ± 3% perf-stat.i.node-load-miss-rate%
38.99 ± 2% +3.9 42.85 ± 4% perf-stat.i.node-store-miss-rate%
61.51 -2.1 59.37 perf-stat.overall.iTLB-load-miss-rate%
4574 +3.3% 4727 perf-stat.overall.instructions-per-iTLB-miss
885569 -3.3% 856059 perf-stat.ps.iTLB-load-misses
0.02 ± 58% -72.5% 0.01 ±119% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
0.00 ±103% +1162.5% 0.02 ±112% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm
0.03 ± 75% -87.1% 0.00 ±106% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_work.fsnotify_destroy_group
0.10 ± 27% -64.5% 0.03 ±105% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
0.06 ± 4% +89.3% 0.11 ± 27% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
0.00 ±103% +1487.5% 0.02 ±111% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm
0.04 ± 79% -90.8% 0.00 ±104% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.fsnotify_destroy_group
3.89 ± 36% -31.4% 2.66 ± 8% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
1097 ± 14% +28.8% 1413 ± 6% perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
0.02 ± 18% -56.5% 0.01 ± 52% perf-sched.wait_time.avg.ms.__cond_resched.mmput.do_task_stat.proc_single_show.seq_read_iter
3.87 ± 37% -31.6% 2.65 ± 8% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
425.39 +13.0% 480.82 perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode
15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.09 ±102% -3.4 0.72 ±223% perf-profile.calltrace.cycles-pp._compound_head.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
5.98 ± 87% -3.0 2.96 ±176% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write
15.00 ± 80% -6.8 8.16 ±147% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
5.27 ± 61% -4.1 1.15 ±223% perf-profile.children.cycles-pp.sched_balance_newidle
5.27 ± 61% -4.1 1.15 ±223% perf-profile.children.cycles-pp.sched_balance_rq
4.09 ±102% -3.4 0.72 ±223% perf-profile.children.cycles-pp._compound_head
5.98 ± 87% -3.0 2.96 ±176% perf-profile.children.cycles-pp.shmem_get_folio_gfp
4.09 ±102% -3.4 0.72 ±223% perf-profile.self.cycles-pp._compound_head




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki