[linus:master] [fs] e249056c91: stress-ng.mq.ops_per_sec 94.3% improvement
From: kernel test robot
Date: Wed Mar 26 2025 - 04:07:09 EST
Hello,
kernel test robot noticed a 94.3% improvement of stress-ng.mq.ops_per_sec on:
commit: e249056c91a2f14ee40de2bf24cf72d8e68101f5 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
parameters:
nr_threads: 100%
testtime: 60s
test: mq
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250326/202503261501.2a99ac6e-lkp@xxxxxxxxx
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/mq/stress-ng/60s
commit:
d3a194d95f ("epoll: simplify ep_busy_loop by removing always 0 argument")
e249056c91 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
d3a194d95fc8d535 e249056c91a2f14ee40de2bf24c
---------------- ---------------------------
%stddev %change %stddev
\ | \
16952975 ±113% +479.0% 98151856 ± 20% cpuidle..usage
6298915 ± 11% +61.8% 10188752 ± 5% vmstat.system.cs
522158 ± 18% +59.9% 835109 ± 6% vmstat.system.in
0.43 ± 32% +0.2 0.67 ± 2% mpstat.cpu.all.irq%
0.06 ± 11% -0.0 0.06 ± 7% mpstat.cpu.all.soft%
6.40 ± 2% +1.3 7.74 ± 5% mpstat.cpu.all.usr%
143216 ± 23% -71.1% 41346 ± 87% numa-numastat.node0.other_node
1203882 ± 13% +73.8% 2092918 ± 29% numa-numastat.node1.numa_hit
55987 ± 58% +180.9% 157244 ± 23% numa-numastat.node1.other_node
1042 ± 35% -82.7% 180.83 ± 21% perf-c2c.DRAM.local
40886 ± 71% +138.2% 97387 ± 23% perf-c2c.HITM.local
46261 ± 60% +119.4% 101476 ± 23% perf-c2c.HITM.total
1835281 ± 25% +151.0% 4606463 ± 38% numa-meminfo.node1.Active
1835281 ± 25% +151.0% 4606463 ± 38% numa-meminfo.node1.Active(anon)
300616 ± 82% +63.6% 491945 ± 44% numa-meminfo.node1.AnonPages
1535692 ± 22% +168.0% 4115480 ± 41% numa-meminfo.node1.Shmem
2.507e+08 ± 9% +94.3% 4.871e+08 ± 5% stress-ng.mq.ops
4178927 ± 9% +94.3% 8118700 ± 5% stress-ng.mq.ops_per_sec
18053 ± 3% -7.4% 16709 stress-ng.time.percent_of_cpu_this_job_got
10197 ± 3% -9.4% 9242 stress-ng.time.system_time
688.89 ± 2% +19.7% 824.66 ± 5% stress-ng.time.user_time
2.076e+08 ± 8% +64.9% 3.423e+08 ± 5% stress-ng.time.voluntary_context_switches
2440860 ± 12% +105.3% 5012226 ± 35% meminfo.Active
2440860 ± 12% +105.3% 5012226 ± 35% meminfo.Active(anon)
5221055 ± 5% +48.7% 7762119 ± 22% meminfo.Cached
7184748 ± 3% +36.1% 9777020 ± 18% meminfo.Committed_AS
361568 ± 3% +47.5% 533427 ± 23% meminfo.Mapped
9552329 ± 3% +28.1% 12232469 ± 14% meminfo.Memused
1692979 ± 17% +150.1% 4234070 ± 41% meminfo.Shmem
9605594 ± 2% +28.3% 12319244 ± 14% meminfo.max_used_kB
4885 ± 48% +33.7% 6532 ± 38% numa-vmstat.node0.nr_page_table_pages
143216 ± 23% -71.1% 41345 ± 87% numa-vmstat.node0.numa_other
460013 ± 25% +149.0% 1145233 ± 38% numa-vmstat.node1.nr_active_anon
75283 ± 82% +63.0% 122733 ± 44% numa-vmstat.node1.nr_anon_pages
384991 ± 22% +165.7% 1022742 ± 41% numa-vmstat.node1.nr_shmem
460006 ± 25% +149.0% 1145231 ± 38% numa-vmstat.node1.nr_zone_active_anon
1204935 ± 13% +73.1% 2086163 ± 29% numa-vmstat.node1.numa_hit
55987 ± 58% +180.9% 157244 ± 23% numa-vmstat.node1.numa_other
0.05 ± 72% -79.9% 0.01 ± 67% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc.d_alloc_parallel
0.06 ±129% -80.3% 0.01 ± 56% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.mqueue_alloc_inode.alloc_inode.new_inode
530.28 ± 31% -54.3% 242.29 ± 61% perf-sched.sch_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
0.28 ± 16% -68.9% 0.09 ± 70% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend
0.09 ± 88% -81.0% 0.02 ± 64% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc.d_alloc_parallel
1868 ± 67% -73.6% 492.86 ±104% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
1269 ± 25% -49.9% 635.66 ± 59% perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
0.77 ± 15% -60.9% 0.30 ± 75% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend
3770 ± 66% -73.8% 989.52 ±103% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.21 ±102% -88.8% 0.02 ± 93% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.mqueue_alloc_inode.alloc_inode.new_inode
739.22 ± 22% -46.8% 393.37 ± 58% perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
1919 ± 64% -73.7% 504.99 ±100% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
608977 ± 11% +105.5% 1251722 ± 35% proc-vmstat.nr_active_anon
188245 +4.0% 195771 ± 2% proc-vmstat.nr_anon_pages
1303984 ± 5% +48.7% 1939187 ± 22% proc-vmstat.nr_file_pages
90548 ± 4% +47.9% 133892 ± 23% proc-vmstat.nr_mapped
421965 ± 16% +150.5% 1057174 ± 41% proc-vmstat.nr_shmem
41883 +4.5% 43768 ± 2% proc-vmstat.nr_slab_reclaimable
122762 +1.7% 124807 proc-vmstat.nr_slab_unreclaimable
608977 ± 11% +105.5% 1251722 ± 35% proc-vmstat.nr_zone_active_anon
39944 ± 15% +190.1% 115861 ± 59% proc-vmstat.numa_hint_faults
27410 ± 19% +281.4% 104548 ± 69% proc-vmstat.numa_hint_faults_local
1684470 ± 8% +58.2% 2665570 ± 23% proc-vmstat.numa_hit
1485253 ± 9% +66.1% 2466944 ± 25% proc-vmstat.numa_local
102341 ± 28% +62.0% 165807 ± 36% proc-vmstat.numa_pte_updates
1751319 ± 7% +57.3% 2754572 ± 23% proc-vmstat.pgalloc_normal
609827 ± 2% +16.2% 708345 ± 11% proc-vmstat.pgfault
0.45 ± 7% +20.7% 0.55 sched_debug.cfs_rq:/.h_nr_queued.stddev
0.43 ± 6% +18.6% 0.51 ± 2% sched_debug.cfs_rq:/.h_nr_runnable.stddev
267.72 ± 12% +27.2% 340.40 ± 3% sched_debug.cfs_rq:/.util_est.stddev
586830 ± 4% -10.6% 524667 ± 3% sched_debug.cpu.avg_idle.avg
1735827 ± 29% -33.7% 1150356 ± 8% sched_debug.cpu.avg_idle.max
15839 ±143% -77.0% 3638 ± 10% sched_debug.cpu.avg_idle.min
139.91 ± 42% -86.4% 19.08 ± 21% sched_debug.cpu.clock.stddev
24838 ± 35% +87.7% 46614 ± 8% sched_debug.cpu.curr->pid.max
2342 ± 22% +63.1% 3820 ± 10% sched_debug.cpu.curr->pid.stddev
631455 ± 10% -19.1% 510552 sched_debug.cpu.max_idle_balance_cost.avg
1697254 ± 18% -53.8% 784378 ± 19% sched_debug.cpu.max_idle_balance_cost.max
175055 ± 25% -80.0% 35047 ± 47% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 48% -81.3% 0.00 ± 35% sched_debug.cpu.next_balance.stddev
0.44 ± 9% +23.6% 0.54 ± 4% sched_debug.cpu.nr_running.stddev
1043526 ± 11% +59.3% 1662017 ± 5% sched_debug.cpu.nr_switches.avg
1390822 ± 6% +62.8% 2263820 ± 13% sched_debug.cpu.nr_switches.max
5.88 ± 8% +31.3% 7.72 ± 10% sched_debug.cpu.nr_uninterruptible.stddev
1.336e+10 ± 5% +63.8% 2.188e+10 ± 4% perf-stat.i.branch-instructions
1.059e+08 ± 7% +66.9% 1.767e+08 ± 5% perf-stat.i.branch-misses
11257488 ± 7% +73.7% 19553240 ± 19% perf-stat.i.cache-misses
1.11e+08 ± 87% +281.9% 4.239e+08 ± 11% perf-stat.i.cache-references
6566144 ± 12% +62.1% 10640456 ± 5% perf-stat.i.context-switches
9.35 ± 4% -43.6% 5.28 ± 5% perf-stat.i.cpi
119675 ±145% +424.8% 628084 ± 8% perf-stat.i.cpu-migrations
55311 ± 7% -40.5% 32929 ± 13% perf-stat.i.cycles-between-cache-misses
6.609e+10 ± 5% +64.8% 1.089e+11 ± 4% perf-stat.i.instructions
0.13 ± 8% +56.9% 0.21 ± 4% perf-stat.i.ipc
34.67 ± 10% +69.3% 58.71 ± 5% perf-stat.i.metric.K/sec
0.10 ± 45% +102.6% 0.21 ± 4% perf-stat.overall.ipc
1.074e+10 ± 45% +99.5% 2.144e+10 ± 4% perf-stat.ps.branch-instructions
85818545 ± 45% +102.0% 1.733e+08 ± 5% perf-stat.ps.branch-misses
9043150 ± 45% +111.7% 19144861 ± 19% perf-stat.ps.cache-misses
1.008e+08 ±101% +313.5% 4.169e+08 ± 11% perf-stat.ps.cache-references
5233955 ± 46% +100.1% 10474383 ± 5% perf-stat.ps.context-switches
117697 ±146% +425.9% 618947 ± 8% perf-stat.ps.cpu-migrations
5.317e+10 ± 45% +100.8% 1.067e+11 ± 4% perf-stat.ps.instructions
5717 ± 44% +52.3% 8706 ± 16% perf-stat.ps.minor-faults
5717 ± 44% +52.3% 8707 ± 16% perf-stat.ps.page-faults
3.319e+12 ± 44% +98.7% 6.593e+12 ± 5% perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki