[linus:master] [mm] f77171d241: vm-scalability.throughput 34.9% improvement
From: kernel test robot
Date: Sun Mar 31 2024 - 10:29:52 EST
Hello,
kernel test robot noticed a 34.9% improvement of vm-scalability.throughput on:
commit: f77171d241e379ea93448a53d58104191e02135c ("mm: allow non-hugetlb large folios to be batch processed")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: vm-scalability
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:
runtime: 300s
test: truncate
cpufreq_governor: performance
Details are as below:
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240331/202403312219.c62301c9-yujie.liu@xxxxxxxxx
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/truncate/vm-scalability
commit:
31b2ff82ae ("mm: handle large folios in free_unref_folios()")
f77171d241 ("mm: allow non-hugetlb large folios to be batch processed")
31b2ff82aefb33ce f77171d241e379ea93448a53d58
---------------- ---------------------------
%stddev %change %stddev
\ | \
7.397e+08 ± 6% +34.9% 9.978e+08 ± 3% vm-scalability.median
7.397e+08 ± 6% +34.9% 9.978e+08 ± 3% vm-scalability.throughput
193.12 ± 7% -16.4% 161.38 ± 3% vm-scalability.time.percent_of_cpu_this_job_got
84.58 ± 8% -16.5% 70.62 ± 3% vm-scalability.time.system_time
154795 ± 85% +168.7% 415963 ± 28% numa-meminfo.node0.Inactive(anon)
41174935 ± 36% -81.1% 7801569 ± 30% proc-vmstat.pgfree
38644 ± 85% +169.0% 103935 ± 28% numa-vmstat.node0.nr_inactive_anon
38644 ± 85% +169.0% 103937 ± 28% numa-vmstat.node0.nr_zone_inactive_anon
18.05 ± 12% -18.1 0.00 perf-profile.calltrace.cycles-pp.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
18.02 ± 12% -18.0 0.00 perf-profile.calltrace.cycles-pp.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict
17.68 ± 12% -17.7 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range
17.63 ± 12% -17.6 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs
17.57 ± 12% -17.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large
22.14 ± 12% -5.9 16.22 ± 8% perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64
22.15 ± 12% -5.9 16.23 ± 8% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.unlinkat
21.78 ± 12% -5.7 16.05 ± 8% perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
1.14 ± 9% +0.1 1.29 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_trylock.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt
1.98 ± 3% +0.2 2.17 ± 2% perf-profile.calltrace.cycles-pp.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
2.24 ± 3% +0.2 2.44 ± 4% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail
2.34 ± 3% +0.2 2.56 ± 4% perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork
2.34 ± 3% +0.2 2.56 ± 4% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread
2.41 ± 3% +0.2 2.64 ± 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.38 ± 3% +0.2 2.61 ± 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.84 ± 4% +0.2 3.09 ± 2% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
6.56 ± 2% +0.6 7.18 ± 3% perf-profile.calltrace.cycles-pp.rep_movs_alternative._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read
6.90 ± 2% +0.7 7.55 ± 3% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
6.98 ± 2% +0.7 7.64 ± 3% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
14.15 ± 3% +1.3 15.48 perf-profile.calltrace.cycles-pp.memset_orig.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages
14.19 ± 3% +1.3 15.53 perf-profile.calltrace.cycles-pp.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order
14.30 ± 3% +1.3 15.64 perf-profile.calltrace.cycles-pp.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages
14.36 ± 3% +1.4 15.72 perf-profile.calltrace.cycles-pp.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read
14.37 ± 3% +1.4 15.73 perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read
14.81 ± 3% +1.4 16.22 perf-profile.calltrace.cycles-pp.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
14.86 ± 3% +1.4 16.28 perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
21.90 ± 3% +2.1 23.98 perf-profile.calltrace.cycles-pp.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read
21.92 ± 3% +2.1 24.01 perf-profile.calltrace.cycles-pp.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64
21.94 ± 3% +2.1 24.02 perf-profile.calltrace.cycles-pp.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.08 ± 3% +2.1 24.18 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
22.09 ± 3% +2.1 24.20 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
22.11 ± 3% +2.1 24.22 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
22.15 ± 3% +2.1 24.27 perf-profile.calltrace.cycles-pp.read
22.11 ± 3% +2.1 24.23 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
45.34 ± 3% +4.1 49.45 ± 2% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
45.76 ± 3% +4.1 49.89 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
45.87 ± 3% +4.1 50.00 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
66.58 ± 3% +5.8 72.37 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
0.00 +15.2 15.18 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs
0.00 +15.3 15.26 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range
0.00 +15.4 15.40 ± 8% perf-profile.calltrace.cycles-pp.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict
0.00 +15.8 15.85 ± 8% perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
18.06 ± 12% -18.1 0.00 perf-profile.children.cycles-pp.__folio_put_large
18.09 ± 12% -17.9 0.16 ± 25% perf-profile.children.cycles-pp.__page_cache_release
17.78 ± 12% -17.7 0.04 ±151% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
22.15 ± 12% -5.9 16.23 ± 8% perf-profile.children.cycles-pp.evict
22.14 ± 12% -5.9 16.22 ± 8% perf-profile.children.cycles-pp.truncate_inode_pages_range
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.children.cycles-pp.__x64_sys_unlinkat
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.children.cycles-pp.do_unlinkat
22.16 ± 12% -5.9 16.24 ± 8% perf-profile.children.cycles-pp.unlinkat
21.85 ± 12% -5.8 16.07 ± 8% perf-profile.children.cycles-pp.folios_put_refs
0.26 ± 10% -0.2 0.07 ± 12% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
0.25 ± 13% -0.2 0.06 ± 8% perf-profile.children.cycles-pp.delete_from_page_cache_batch
0.17 ± 6% -0.1 0.08 ± 8% perf-profile.children.cycles-pp.__mod_lruvec_state
0.16 ± 7% -0.1 0.08 ± 9% perf-profile.children.cycles-pp.__mod_node_page_state
0.07 ± 9% -0.0 0.03 ± 77% perf-profile.children.cycles-pp.begin_new_exec
0.14 ± 7% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.__mmput
0.14 ± 6% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.exit_mmap
0.07 ± 8% -0.0 0.03 ± 78% perf-profile.children.cycles-pp.folio_batch_move_lru
0.14 ± 5% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.load_elf_binary
0.14 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.exec_binprm
0.14 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.search_binary_handler
0.17 ± 4% -0.0 0.14 ± 4% perf-profile.children.cycles-pp.bprm_execve
0.09 ± 7% +0.0 0.11 ± 9% perf-profile.children.cycles-pp.__filemap_add_folio
0.13 ± 8% +0.0 0.16 ± 7% perf-profile.children.cycles-pp.filemap_add_folio
0.32 ± 3% +0.0 0.35 perf-profile.children.cycles-pp.read_tsc
0.27 ± 3% +0.0 0.31 ± 4% perf-profile.children.cycles-pp.rcu_core
0.52 ± 3% +0.0 0.56 ± 3% perf-profile.children.cycles-pp.update_sg_lb_stats
0.35 ± 5% +0.0 0.39 ± 5% perf-profile.children.cycles-pp.run_rebalance_domains
0.00 +0.1 0.07 ± 9% perf-profile.children.cycles-pp.free_tail_page_prepare
1.20 ± 9% +0.2 1.35 ± 2% perf-profile.children.cycles-pp._raw_spin_trylock
2.12 ± 2% +0.2 2.32 ± 2% perf-profile.children.cycles-pp.rebalance_domains
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.commit_tail
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_commit
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_fb_memcpy
2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.memcpy_toio
2.34 ± 3% +0.2 2.56 ± 4% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
2.34 ± 3% +0.2 2.56 ± 4% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
2.41 ± 3% +0.2 2.64 ± 4% perf-profile.children.cycles-pp.worker_thread
2.38 ± 3% +0.2 2.61 ± 4% perf-profile.children.cycles-pp.process_one_work
3.12 ± 4% +0.3 3.39 ± 2% perf-profile.children.cycles-pp.__do_softirq
3.50 ± 4% +0.3 3.83 ± 4% perf-profile.children.cycles-pp.irq_exit_rcu
0.00 +0.4 0.38 ± 7% perf-profile.children.cycles-pp.free_unref_page_prepare
6.59 ± 2% +0.6 7.21 ± 3% perf-profile.children.cycles-pp.rep_movs_alternative
6.94 ± 2% +0.7 7.60 ± 3% perf-profile.children.cycles-pp._copy_to_iter
6.99 ± 2% +0.7 7.65 ± 3% perf-profile.children.cycles-pp.copy_page_to_iter
14.17 ± 3% +1.3 15.51 perf-profile.children.cycles-pp.memset_orig
14.19 ± 3% +1.3 15.53 perf-profile.children.cycles-pp.zero_user_segments
14.30 ± 3% +1.3 15.64 perf-profile.children.cycles-pp.iomap_readpage_iter
14.36 ± 3% +1.4 15.72 perf-profile.children.cycles-pp.iomap_readahead
14.37 ± 3% +1.4 15.73 perf-profile.children.cycles-pp.read_pages
14.81 ± 3% +1.4 16.22 perf-profile.children.cycles-pp.page_cache_ra_order
14.86 ± 3% +1.4 16.28 perf-profile.children.cycles-pp.filemap_get_pages
21.90 ± 3% +2.1 23.99 perf-profile.children.cycles-pp.filemap_read
21.92 ± 3% +2.1 24.01 perf-profile.children.cycles-pp.xfs_file_buffered_read
21.94 ± 3% +2.1 24.03 perf-profile.children.cycles-pp.xfs_file_read_iter
22.09 ± 3% +2.1 24.20 perf-profile.children.cycles-pp.vfs_read
22.11 ± 3% +2.1 24.22 perf-profile.children.cycles-pp.ksys_read
22.18 ± 3% +2.1 24.30 perf-profile.children.cycles-pp.read
42.82 ± 3% +3.8 46.65 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
45.46 ± 3% +4.1 49.57 ± 2% perf-profile.children.cycles-pp.acpi_safe_halt
45.57 ± 3% +4.1 49.68 ± 2% perf-profile.children.cycles-pp.acpi_idle_enter
45.99 ± 3% +4.1 50.12 ± 2% perf-profile.children.cycles-pp.cpuidle_enter_state
46.09 ± 3% +4.1 50.22 ± 2% perf-profile.children.cycles-pp.cpuidle_enter
1.12 ± 19% +14.3 15.41 ± 8% perf-profile.children.cycles-pp.free_one_page
0.00 +15.9 15.86 ± 8% perf-profile.children.cycles-pp.free_unref_folios
0.16 ± 7% -0.1 0.08 ± 9% perf-profile.self.cycles-pp.__mod_node_page_state
0.46 ± 3% -0.0 0.42 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.31 ± 4% +0.0 0.35 ± 2% perf-profile.self.cycles-pp.read_tsc
0.40 ± 4% +0.0 0.44 ± 5% perf-profile.self.cycles-pp._copy_to_iter
0.38 ± 2% +0.0 0.43 ± 2% perf-profile.self.cycles-pp.menu_select
0.00 +0.1 0.05 ± 6% perf-profile.self.cycles-pp.free_tail_page_prepare
1.19 ± 9% +0.2 1.34 ± 2% perf-profile.self.cycles-pp._raw_spin_trylock
2.26 ± 3% +0.2 2.47 ± 4% perf-profile.self.cycles-pp.memcpy_toio
0.00 +0.3 0.33 ± 7% perf-profile.self.cycles-pp.free_unref_page_prepare
6.50 ± 2% +0.6 7.11 ± 3% perf-profile.self.cycles-pp.rep_movs_alternative
14.09 ± 3% +1.3 15.42 perf-profile.self.cycles-pp.memset_orig
26.19 ± 4% +2.1 28.29 ± 2% perf-profile.self.cycles-pp.acpi_safe_halt
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki