[linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement

From: kernel test robot
Date: Thu Jul 25 2024 - 04:04:44 EST




Hello,

kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on:


commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
test: seal
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s

commit:
844776cb65 ("mm/slub: mark racy access on slab->freelist")
306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")

844776cb65a77ef2 306c4ac9896b07b8872293eb224
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.51 ± 27% +1.9 4.44 ± 35% mpstat.cpu.all.idle%
975100 ± 19% +29.5% 1262643 ± 16% numa-meminfo.node1.AnonPages.max
187.06 ± 4% -11.5% 165.63 ± 10% sched_debug.cfs_rq:/.runnable_avg.stddev
0.05 ± 18% -40.0% 0.03 ± 58% vmstat.procs.b
58973718 +5.2% 62024061 stress-ng.seal.ops
982893 +5.2% 1033732 stress-ng.seal.ops_per_sec
59045344 +5.2% 62095668 stress-ng.time.minor_page_faults
174957 +1.4% 177400 proc-vmstat.nr_slab_unreclaimable
63634761 +5.5% 67148443 proc-vmstat.numa_hit
63399995 +5.5% 66914221 proc-vmstat.numa_local
73601172 +6.1% 78073549 proc-vmstat.pgalloc_normal
59870250 +5.3% 63063514 proc-vmstat.pgfault
72718474 +6.0% 77106313 proc-vmstat.pgfree
1.983e+10 +1.3% 2.01e+10 perf-stat.i.branch-instructions
66023349 +5.6% 69728143 perf-stat.i.cache-misses
2.023e+08 +4.7% 2.117e+08 perf-stat.i.cache-references
7.22 -1.9% 7.08 perf-stat.i.cpi
9738 -5.6% 9196 perf-stat.i.cycles-between-cache-misses
8.799e+10 +1.6% 8.939e+10 perf-stat.i.instructions
0.14 +1.6% 0.14 perf-stat.i.ipc
8.71 +5.1% 9.16 perf-stat.i.metric.K/sec
983533 +4.7% 1029816 perf-stat.i.minor-faults
983533 +4.7% 1029816 perf-stat.i.page-faults
7.30 -18.4% 5.96 ± 44% perf-stat.overall.cpi
9735 -21.3% 7658 ± 44% perf-stat.overall.cycles-between-cache-misses
0.52 +0.1 0.62 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
0.56 +0.1 0.67 ± 7% perf-profile.calltrace.cycles-pp.ftruncate64
0.34 ± 70% +0.3 0.60 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
48.29 +0.6 48.86 perf-profile.calltrace.cycles-pp.__close
48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
48.26 +0.6 48.83 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
0.00 +0.6 0.58 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
48.21 +0.6 48.80 perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
48.03 +0.6 48.68 perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
48.02 +0.6 48.66 perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64
47.76 +0.7 48.47 perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
47.19 +0.7 47.92 perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
47.11 +0.8 47.88 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
0.74 -0.3 0.48 ± 8% perf-profile.children.cycles-pp.__munmap
0.69 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__x64_sys_munmap
0.68 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__vm_munmap
0.68 -0.2 0.45 ± 9% perf-profile.children.cycles-pp.do_vmi_munmap
0.65 -0.2 0.42 ± 8% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.44 -0.2 0.28 ± 7% perf-profile.children.cycles-pp.unmap_region
0.48 -0.1 0.36 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault
0.42 -0.1 0.32 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault
0.42 ± 2% -0.1 0.32 ± 7% perf-profile.children.cycles-pp.exc_page_fault
0.38 ± 2% -0.1 0.29 ± 7% perf-profile.children.cycles-pp.handle_mm_fault
0.35 ± 2% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.__handle_mm_fault
0.33 ± 2% -0.1 0.26 ± 6% perf-profile.children.cycles-pp.do_fault
0.21 ± 2% -0.1 0.14 ± 8% perf-profile.children.cycles-pp.lru_add_drain
0.22 -0.1 0.15 ± 11% perf-profile.children.cycles-pp.alloc_inode
0.21 ± 2% -0.1 0.15 ± 9% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.18 ± 2% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.unmap_vmas
0.21 ± 2% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.folio_batch_move_lru
0.17 -0.1 0.11 ± 8% perf-profile.children.cycles-pp.unmap_page_range
0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pte_range
0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pmd_range
0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.shmem_fault
0.50 -0.1 0.45 ± 8% perf-profile.children.cycles-pp.mmap_region
0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.__do_fault
0.26 -0.1 0.21 ± 6% perf-profile.children.cycles-pp.shmem_get_folio_gfp
0.19 ± 2% -0.1 0.14 ± 14% perf-profile.children.cycles-pp.write
0.22 ± 3% -0.0 0.18 ± 5% perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
0.11 ± 4% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.mas_store_gfp
0.16 ± 2% -0.0 0.12 ± 11% perf-profile.children.cycles-pp.mas_wr_store_entry
0.14 -0.0 0.10 ± 10% perf-profile.children.cycles-pp.mas_wr_node_store
0.08 -0.0 0.04 ± 45% perf-profile.children.cycles-pp.msync
0.06 -0.0 0.02 ± 99% perf-profile.children.cycles-pp.mas_find
0.12 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.inode_init_always
0.10 ± 3% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.shmem_alloc_inode
0.16 -0.0 0.13 ± 9% perf-profile.children.cycles-pp.__x64_sys_fcntl
0.11 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.shmem_file_write_iter
0.10 ± 4% -0.0 0.08 ± 8% perf-profile.children.cycles-pp.do_fcntl
0.15 -0.0 0.13 ± 8% perf-profile.children.cycles-pp.destroy_inode
0.16 ± 3% -0.0 0.14 ± 7% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.22 ± 3% -0.0 0.20 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.08 -0.0 0.06 ± 11% perf-profile.children.cycles-pp.___slab_alloc
0.15 ± 3% -0.0 0.12 ± 8% perf-profile.children.cycles-pp.__destroy_inode
0.07 ± 7% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__call_rcu_common
0.13 ± 2% -0.0 0.11 ± 8% perf-profile.children.cycles-pp.perf_event_mmap
0.09 -0.0 0.07 ± 9% perf-profile.children.cycles-pp.memfd_fcntl
0.06 -0.0 0.04 ± 44% perf-profile.children.cycles-pp.native_irq_return_iret
0.08 ± 6% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.shmem_add_to_page_cache
0.12 -0.0 0.10 ± 6% perf-profile.children.cycles-pp.perf_event_mmap_event
0.11 ± 3% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
0.10 -0.0 0.08 ± 8% perf-profile.children.cycles-pp.uncharge_batch
0.12 ± 4% -0.0 0.10 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.05 +0.0 0.07 ± 5% perf-profile.children.cycles-pp.__d_alloc
0.05 +0.0 0.07 ± 10% perf-profile.children.cycles-pp.d_alloc_pseudo
0.07 +0.0 0.09 ± 7% perf-profile.children.cycles-pp.file_init_path
0.06 ± 6% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.security_file_alloc
0.07 ± 7% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.errseq_sample
0.04 ± 44% +0.0 0.07 ± 10% perf-profile.children.cycles-pp.apparmor_file_alloc_security
0.09 +0.0 0.12 ± 5% perf-profile.children.cycles-pp.init_file
0.15 +0.0 0.18 ± 7% perf-profile.children.cycles-pp.common_perm_cond
0.15 ± 3% +0.0 0.19 ± 8% perf-profile.children.cycles-pp.security_file_truncate
0.20 +0.0 0.24 ± 7% perf-profile.children.cycles-pp.notify_change
0.06 +0.0 0.10 ± 6% perf-profile.children.cycles-pp.inode_init_owner
0.13 +0.0 0.18 ± 5% perf-profile.children.cycles-pp.alloc_empty_file
0.10 +0.1 0.16 ± 7% perf-profile.children.cycles-pp.clear_nlink
0.47 +0.1 0.56 ± 7% perf-profile.children.cycles-pp.do_ftruncate
0.49 +0.1 0.59 ± 7% perf-profile.children.cycles-pp.__x64_sys_ftruncate
0.59 +0.1 0.70 ± 7% perf-profile.children.cycles-pp.ftruncate64
0.28 +0.1 0.40 ± 6% perf-profile.children.cycles-pp.alloc_file_pseudo
98.62 +0.2 98.77 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
98.58 +0.2 98.74 perf-profile.children.cycles-pp.do_syscall_64
48.30 +0.6 48.86 perf-profile.children.cycles-pp.__close
48.26 +0.6 48.83 perf-profile.children.cycles-pp.__x64_sys_close
48.21 +0.6 48.80 perf-profile.children.cycles-pp.__fput
48.04 +0.6 48.68 perf-profile.children.cycles-pp.dput
48.02 +0.6 48.67 perf-profile.children.cycles-pp.__dentry_kill
47.77 +0.7 48.47 perf-profile.children.cycles-pp.evict
0.30 -0.1 0.23 ± 7% perf-profile.self.cycles-pp._raw_spin_lock
0.10 ± 4% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.__fput
0.08 ± 6% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.inode_init_always
0.06 -0.0 0.04 ± 44% perf-profile.self.cycles-pp.native_irq_return_iret
0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.09 -0.0 0.08 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.07 +0.0 0.09 ± 7% perf-profile.self.cycles-pp.__shmem_get_inode
0.06 ± 7% +0.0 0.09 ± 9% perf-profile.self.cycles-pp.errseq_sample
0.15 ± 2% +0.0 0.18 ± 7% perf-profile.self.cycles-pp.common_perm_cond
0.03 ± 70% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.apparmor_file_alloc_security
0.06 +0.0 0.10 ± 7% perf-profile.self.cycles-pp.inode_init_owner
0.10 +0.1 0.16 ± 6% perf-profile.self.cycles-pp.clear_nlink




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki