[linux-next:master] [mm] 7b6218ae12: stress-ng.forkheavy.ops_per_sec 5.0% improvement

From: kernel test robot
Date: Mon Mar 31 2025 - 09:25:10 EST




Hello,

kernel test robot noticed a 5.0% improvement of stress-ng.forkheavy.ops_per_sec on:


commit: 7b6218ae1253491d56f21f4b1f3609f3dd873600 ("mm: move per-vma lock into vm_area_struct")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
parameters:

nr_threads: 100%
testtime: 60s
test: forkheavy
cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250331/202503311656.e3596aaf-lkp@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/forkheavy/stress-ng/60s

commit:
b2ae5fccb8 ("mm: introduce vma_start_read_locked{_nested} helpers")
7b6218ae12 ("mm: move per-vma lock into vm_area_struct")

b2ae5fccb8c0ec21 7b6218ae1253491d56f21f4b1f3
---------------- ---------------------------
%stddev %change %stddev
\ | \
382800 ± 4% +10.2% 421797 ± 5% numa-meminfo.node1.AnonHugePages
32850 +5.0% 34492 stress-ng.forkheavy.ops
493.66 +5.0% 518.50 stress-ng.forkheavy.ops_per_sec
40.74 ± 30% +68.2% 68.53 ± 23% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
73.19 ± 42% +52.2% 111.39 ± 16% sched_debug.cfs_rq:/.util_est.avg
222.12 ± 29% +34.4% 298.62 ± 10% sched_debug.cfs_rq:/.util_est.stddev
4555 ± 10% -45.3% 2491 ± 27% perf-c2c.DRAM.local
11750 ± 4% -22.7% 9082 ± 22% perf-c2c.HITM.local
2592 ± 6% -45.4% 1414 ± 23% perf-c2c.HITM.remote
14342 ± 4% -26.8% 10497 ± 22% perf-c2c.HITM.total
41336771 -4.4% 39526485 proc-vmstat.numa_hit
41134683 -4.4% 39326465 proc-vmstat.numa_local
71479761 +1.8% 72742225 proc-vmstat.pgalloc_normal
3480841 +2.4% 3564757 proc-vmstat.pgfault
71044889 +1.7% 72274310 proc-vmstat.pgfree
1.47 ± 86% -73.5% 0.39 ±138% perf-sched.sch_delay.avg.ms.__cond_resched.do_ftruncate.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.33 ±108% +205.7% 1.00 ± 83% perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_mq_open.__x64_sys_mq_open.do_syscall_64
0.77 ± 25% +43.6% 1.10 ± 21% perf-sched.sch_delay.avg.ms.__cond_resched.dput.vfs_tmpfile.path_openat.do_filp_open
0.16 ± 17% +44.7% 0.23 ± 26% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.58 ± 85% -85.8% 0.08 ±130% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
3.92 ± 72% -80.6% 0.76 ±198% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
6.96 ± 55% +113.7% 14.88 ± 28% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
62.68 ± 72% +129.9% 144.11 ± 9% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
334.97 ± 57% -66.4% 112.42 ± 70% perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
82.80 ± 23% +73.9% 143.96 ± 9% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.15 ± 43% -72.2% 0.60 ± 94% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.__mmap_new_vma.__mmap_region
68.44 ±135% +288.8% 266.12 ±121% perf-sched.wait_time.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
15.31 +8.9% 16.67 ± 2% perf-stat.i.MPKI
1.684e+10 -3.9% 1.618e+10 perf-stat.i.branch-instructions
75533943 -4.7% 72015903 perf-stat.i.branch-misses
6.71 +5.6% 7.09 perf-stat.i.cpi
8.19e+10 -5.7% 7.726e+10 perf-stat.i.instructions
0.16 -4.9% 0.15 perf-stat.i.ipc
16.72 +7.0% 17.90 perf-stat.overall.MPKI
6.53 +6.2% 6.94 perf-stat.overall.cpi
0.15 -5.9% 0.14 perf-stat.overall.ipc
1.66e+10 -4.2% 1.59e+10 perf-stat.ps.branch-instructions
73765712 -5.4% 69811938 perf-stat.ps.branch-misses
8.092e+10 -5.9% 7.612e+10 perf-stat.ps.instructions
5.53e+12 -5.5% 5.227e+12 perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki