[linus:master] [smb3] edfc6481fa: filebench.sum_operations/s 4194.8% improvement

From: kernel test robot
Date: Mon May 27 2024 - 21:15:17 EST




Hello,

kernel test robot noticed a 4194.8% improvement of filebench.sum_operations/s on:


commit: edfc6481faf896301cab940da776229fe39e9fc9 ("smb3: fix perf regression with cached writes with netfs conversion")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: filebench
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

disk: 1HDD
fs: ext4
fs2: cifs
test: randomwrite.f
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240527/202405271633.b56b258d-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase:
gcc-13/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/randomwrite.f/filebench

commit:
14b1cd2534 ("cifs: Fix locking in cifs_strict_readv()")
edfc6481fa ("smb3: fix perf regression with cached writes with netfs conversion")

14b1cd25346b1d61 edfc6481faf896301cab940da77
---------------- ---------------------------
%stddev %change %stddev
\ | \
3814731 ± 93% -62.9% 1414791 ± 44% cpuidle..usage
91.23 ± 4% +6.5% 97.17 iostat.cpu.idle
1817 ± 25% -49.1% 925.83 ± 36% perf-c2c.DRAM.remote
207192 +418.2% 1073659 ± 20% meminfo.AnonHugePages
2604959 ± 5% +65.7% 4315389 ± 4% meminfo.Dirty
69239 ±139% +547.1% 448063 ± 51% numa-meminfo.node0.AnonHugePages
138049 ± 70% +353.2% 625629 ± 65% numa-meminfo.node1.AnonHugePages
33.79 ±139% +547.7% 218.82 ± 51% numa-vmstat.node0.nr_anon_transparent_hugepages
67.47 ± 70% +353.0% 305.60 ± 65% numa-vmstat.node1.nr_anon_transparent_hugepages
10799 ± 25% -35.4% 6972 ± 8% sched_debug.cfs_rq:/.load.avg
37988 ±120% +526.0% 237792 ± 59% sched_debug.cpu.avg_idle.min
4690 ±153% -92.0% 376.83 ± 24% sched_debug.cpu.nr_switches.min
69222 ± 3% -16.7% 57628 vmstat.io.bo
0.73 ± 12% -24.9% 0.55 ± 2% vmstat.procs.b
19540 ± 24% -55.2% 8762 ± 12% vmstat.system.in
0.58 ± 14% -0.2 0.41 mpstat.cpu.all.iowait%
0.05 ± 32% -0.0 0.02 ± 14% mpstat.cpu.all.irq%
0.05 ± 14% -0.0 0.02 ± 6% mpstat.cpu.all.soft%
2.00 +2391.7% 49.83 ± 27% mpstat.max_utilization.seconds
58.54 ± 7% -24.5% 44.17 ± 13% mpstat.max_utilization_pct
99.67 ±163% +4194.7% 4280 ± 7% filebench.sum_bytes_mb/s
765489 ±163% +4194.8% 32875866 ± 7% filebench.sum_operations
12757 ±163% +4194.8% 547887 ± 7% filebench.sum_operations/s
0.24 ± 41% -99.2% 0.00 filebench.sum_time_ms/op
12757 ±163% +4194.8% 547887 ± 7% filebench.sum_writes/s
241.17 ± 80% +321.8% 1017 ± 8% filebench.time.involuntary_context_switches
22.67 ± 23% +63.2% 37.00 filebench.time.percent_of_cpu_this_job_got
37.73 ± 23% +49.1% 56.25 filebench.time.system_time
1.997e+09 ± 45% -62.3% 7.533e+08 ± 26% perf-stat.i.branch-instructions
11.93 ± 23% +3.9 15.84 ± 7% perf-stat.i.cache-miss-rate%
1.589e+08 ± 5% -36.2% 1.013e+08 ± 6% perf-stat.i.cache-references
1227 ± 13% -23.7% 937.19 ± 7% perf-stat.i.cycles-between-cache-misses
9.86e+09 ± 45% -63.3% 3.621e+09 ± 27% perf-stat.i.instructions
4.84 ± 44% +96.1% 9.48 ± 18% perf-stat.overall.MPKI
830.79 ± 40% -62.4% 312.02 ± 34% perf-stat.overall.cycles-between-cache-misses
1.994e+09 ± 45% -62.2% 7.528e+08 ± 27% perf-stat.ps.branch-instructions
1.585e+08 ± 5% -36.3% 1.01e+08 ± 6% perf-stat.ps.cache-references
9.842e+09 ± 45% -63.2% 3.62e+09 ± 27% perf-stat.ps.instructions
1.637e+12 ± 45% -62.9% 6.073e+11 ± 27% perf-stat.total.instructions
101.22 +418.0% 524.27 ± 20% proc-vmstat.nr_anon_transparent_hugepages
2918550 ± 3% +421.9% 15232014 ± 9% proc-vmstat.nr_dirtied
650592 ± 5% +66.0% 1079880 ± 4% proc-vmstat.nr_dirty
23980 -2.1% 23472 proc-vmstat.nr_kernel_stack
17286 ± 6% -5.1% 16397 proc-vmstat.nr_mapped
79441 -2.5% 77426 proc-vmstat.nr_slab_unreclaimable
662082 ± 6% +66.5% 1102087 ± 5% proc-vmstat.nr_zone_write_pending
8719968 ± 21% -48.5% 4491902 ± 10% proc-vmstat.numa_hit
8.00 ± 20% +12912.5% 1041 ± 45% proc-vmstat.numa_huge_pte_updates
8584943 ± 21% -49.2% 4359325 ± 10% proc-vmstat.numa_local
11674686 ± 3% -16.0% 9806002 proc-vmstat.pgpgout
2.00 +51250.0% 1027 ± 56% proc-vmstat.thp_fault_alloc
4.19 ±100% -1.7 2.53 ±144% perf-profile.calltrace.cycles-pp.scsi_end_request.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu
4.19 ±100% -1.7 2.53 ±144% perf-profile.calltrace.cycles-pp.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt
4.24 ±100% -1.7 2.58 ±145% perf-profile.calltrace.cycles-pp.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state.cpuidle_enter
4.23 ±100% -1.6 2.58 ±145% perf-profile.calltrace.cycles-pp.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state
4.20 ±100% -1.6 2.57 ±145% perf-profile.calltrace.cycles-pp.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt
0.50 ± 46% +0.3 0.78 ± 5% perf-profile.calltrace.cycles-pp.write
0.28 ±100% +0.4 0.67 ± 6% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.31 ±100% +0.4 0.71 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.31 ±100% +0.4 0.71 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
0.19 ±141% +0.5 0.64 ± 6% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.50 ± 14% +0.5 3.04 ± 8% perf-profile.calltrace.cycles-pp.read
2.66 ± 14% +0.6 3.28 ± 9% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.33 ± 11% +0.6 2.98 ± 14% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.33 ± 11% +0.7 3.00 ± 14% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
0.22 ± 20% -0.2 0.06 ± 83% perf-profile.children.cycles-pp.native_apic_mem_eoi
0.23 ± 11% -0.1 0.15 ± 24% perf-profile.children.cycles-pp.getenv
0.03 ±141% +0.1 0.10 ± 29% perf-profile.children.cycles-pp.set_task_cpu
0.01 ±223% +0.1 0.08 ± 37% perf-profile.children.cycles-pp.__radix_tree_lookup
0.00 +0.1 0.10 ± 43% perf-profile.children.cycles-pp.kmalloc_trace
0.01 ±223% +0.1 0.12 ± 37% perf-profile.children.cycles-pp.free_pcppages_bulk
0.16 ± 33% +0.1 0.29 ± 29% perf-profile.children.cycles-pp.vm_area_alloc
0.10 ± 79% +0.1 0.24 ± 26% perf-profile.children.cycles-pp.leave_mm
0.24 ± 19% +0.2 0.41 ± 36% perf-profile.children.cycles-pp.strnlen_user
0.41 ± 22% +0.2 0.58 ± 19% perf-profile.children.cycles-pp.migration_cpu_stop
0.68 ± 12% +0.2 0.86 ± 6% perf-profile.children.cycles-pp.ksys_write
0.65 ± 15% +0.2 0.84 ± 6% perf-profile.children.cycles-pp.vfs_write
0.41 ± 22% +0.2 0.62 ± 19% perf-profile.children.cycles-pp.cpu_stopper_thread
0.79 ± 10% +0.2 1.00 ± 3% perf-profile.children.cycles-pp.write
0.47 ± 28% +0.2 0.70 ± 23% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.24 ± 35% +0.2 0.47 ± 20% perf-profile.children.cycles-pp.set_pte_range
0.43 ± 28% +0.2 0.67 ± 25% perf-profile.children.cycles-pp.d_alloc_parallel
0.58 ± 23% +0.3 0.87 ± 21% perf-profile.children.cycles-pp.__lookup_slow
0.98 ± 22% +0.3 1.27 ± 13% perf-profile.children.cycles-pp.copy_process
1.39 ± 9% +0.3 1.74 ± 13% perf-profile.children.cycles-pp.filemap_map_pages
1.49 ± 9% +0.4 1.91 ± 11% perf-profile.children.cycles-pp.do_read_fault
1.75 ± 10% +0.5 2.26 ± 8% perf-profile.children.cycles-pp.do_fault
2.66 ± 14% +0.6 3.28 ± 9% perf-profile.children.cycles-pp.smpboot_thread_fn
3.94 ± 16% +0.7 4.62 ± 8% perf-profile.children.cycles-pp.read
4.09 ± 4% +0.8 4.93 ± 8% perf-profile.children.cycles-pp.asm_exc_page_fault
3.08 ± 10% +0.9 3.96 ± 8% perf-profile.children.cycles-pp.__handle_mm_fault
3.22 ± 8% +1.0 4.18 ± 8% perf-profile.children.cycles-pp.handle_mm_fault
3.44 ± 6% +1.0 4.47 ± 9% perf-profile.children.cycles-pp.do_user_addr_fault
3.45 ± 5% +1.0 4.48 ± 9% perf-profile.children.cycles-pp.exc_page_fault
20.31 ± 9% +2.6 22.93 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
20.26 ± 9% +2.6 22.88 ± 6% perf-profile.children.cycles-pp.do_syscall_64
0.21 ± 20% -0.2 0.06 ± 83% perf-profile.self.cycles-pp.native_apic_mem_eoi
0.12 ± 30% +0.1 0.18 ± 19% perf-profile.self.cycles-pp.newidle_balance
0.01 ±223% +0.1 0.08 ± 37% perf-profile.self.cycles-pp.__radix_tree_lookup
0.00 +0.1 0.09 ± 39% perf-profile.self.cycles-pp.kmalloc_trace
0.05 ±111% +0.1 0.17 ± 36% perf-profile.self.cycles-pp.leave_mm
0.23 ± 23% +0.2 0.39 ± 40% perf-profile.self.cycles-pp.strnlen_user
0.10 ± 53% +0.2 0.30 ± 59% perf-profile.self.cycles-pp.read_counters




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki