[linus:master] [nvme] 63dfa10043: fsmark.files_per_sec 6.4% improvement

From: kernel test robot
Date: Fri Mar 15 2024 - 04:21:35 EST




Hello,

kernel test robot noticed a 6.4% improvement of fsmark.files_per_sec on:


commit: 63dfa1004322d596417f23da43cdc43cf6298c71 ("nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of nvme_config_discard")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: fsmark
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
parameters:

iterations: 8
disk: 1SSD
nr_threads: 16
fs: ext4
filesize: 8K
test_size: 75G
sync_method: fsyncBeforeClose
nr_directories: 16d
nr_files_per_directory: 256fpd
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240315/202403151552.e3809b61-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1SSD/8K/ext4/8/x86_64-rhel-8.3/16d/256fpd/16/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-csl-2sp3/75G/fsmark

commit:
152694c829 ("nvme: set max_hw_sectors unconditionally")
63dfa10043 ("nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of nvme_config_discard")

152694c82950a093 63dfa1004322d596417f23da43c
---------------- ---------------------------
%stddev %change %stddev
\ | \
492322 ± 8% +15.1% 566574 ± 2% meminfo.Active(anon)
501325 ± 8% +15.0% 576573 ± 2% meminfo.Shmem
458144 ± 18% +22.6% 561659 ± 2% numa-meminfo.node1.Active(anon)
462634 ± 18% +22.6% 567357 ± 2% numa-meminfo.node1.Shmem
114517 ± 18% +22.6% 140395 ± 2% numa-vmstat.node1.nr_active_anon
115654 ± 18% +22.6% 141838 ± 2% numa-vmstat.node1.nr_shmem
114517 ± 18% +22.6% 140395 ± 2% numa-vmstat.node1.nr_zone_active_anon
396.50 +745.6% 3353 ±181% vmstat.memory.buff
201414 +6.0% 213473 vmstat.system.cs
57760 +5.4% 60879 vmstat.system.in
22022 ± 2% +6.4% 23432 fsmark.files_per_sec
502.56 -5.9% 472.94 fsmark.time.elapsed_time
502.56 -5.9% 472.94 fsmark.time.elapsed_time.max
243.62 ± 2% +5.0% 255.75 fsmark.time.percent_of_cpu_this_job_got
123079 ± 8% +15.1% 141624 ± 2% proc-vmstat.nr_active_anon
8462 +2.1% 8637 proc-vmstat.nr_mapped
125342 ± 8% +15.0% 144138 ± 2% proc-vmstat.nr_shmem
123079 ± 8% +15.1% 141624 ± 2% proc-vmstat.nr_zone_active_anon
140970 ± 7% +14.1% 160889 ± 2% proc-vmstat.pgactivate
3.617e+08 -3.7% 3.483e+08 proc-vmstat.pgpgout
2.10 ± 9% -0.2 1.85 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
2.08 ± 9% -0.2 1.84 ± 3% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.99 ± 20% +0.4 1.37 ± 11% perf-profile.calltrace.cycles-pp.jbd2__journal_start.ext4_do_writepages.ext4_writepages.do_writepages.filemap_fdatawrite_wbc
0.50 ± 60% +0.4 0.89 ± 14% perf-profile.calltrace.cycles-pp.add_transaction_credits.start_this_handle.jbd2__journal_start.ext4_do_writepages.ext4_writepages
2.50 ± 10% -0.3 2.20 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.48 ± 10% -0.3 2.19 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.24 ± 6% +0.0 0.27 ± 6% perf-profile.children.cycles-pp.ext4_dirty_inode
0.19 ± 11% +0.1 0.24 ± 6% perf-profile.children.cycles-pp.ext4_block_bitmap_csum_set
1.107e+09 +6.6% 1.18e+09 perf-stat.i.branch-instructions
202521 +6.1% 214902 perf-stat.i.context-switches
1.322e+10 ± 2% +6.7% 1.41e+10 perf-stat.i.cpu-cycles
5.46e+09 +6.6% 5.818e+09 perf-stat.i.instructions
2.11 +6.2% 2.24 perf-stat.i.metric.K/sec
1.105e+09 +6.6% 1.178e+09 perf-stat.ps.branch-instructions
202013 +6.1% 214333 perf-stat.ps.context-switches
1.319e+10 ± 2% +6.7% 1.407e+10 perf-stat.ps.cpu-cycles
5.448e+09 +6.5% 5.805e+09 perf-stat.ps.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki