[linus:master] [shmem] a2e459555c: aim9.disk_src.ops_per_sec -19.0% regression

From: kernel test robot
Date: Fri Sep 08 2023 - 01:27:18 EST




Hello,

kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on:


commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: aim9
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

testtime: 300s
test: disk_src
cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.disk_src.ops_per_sec -14.6% regression |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory |
| test parameters | cpufreq_governor=performance |
| | test=all |
| | testtime=5s |
+------------------+-------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230908/202309081306.3ecb3734-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s

commit:
23a31d8764 ("shmem: Refactor shmem_symlink()")
a2e459555c ("shmem: stable directory offsets")

23a31d87645c6527 a2e459555c5f9da3e619b7e47a6
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.26 ± 9% +0.1 0.36 ± 2% mpstat.cpu.all.soft%
0.61 -0.1 0.52 mpstat.cpu.all.usr%
0.16 ± 10% -18.9% 0.13 ± 12% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.04 ± 7% +1802.4% 0.78 ±115% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
202424 -19.0% 163868 aim9.disk_src.ops_per_sec
94.83 -4.2% 90.83 aim9.time.percent_of_cpu_this_job_got
73.62 -17.6% 60.69 aim9.time.user_time
23541 +6.5% 25074 proc-vmstat.nr_slab_reclaimable
1437319 ± 24% +377.6% 6864201 proc-vmstat.numa_hit
1387016 ± 25% +391.4% 6815486 proc-vmstat.numa_local
4864362 ± 34% +453.6% 26931180 proc-vmstat.pgalloc_normal
4835960 ± 34% +455.4% 26856610 proc-vmstat.pgfree
538959 ± 24% -23.2% 414090 sched_debug.cfs_rq:/.load.max
130191 ± 14% -13.3% 112846 ± 6% sched_debug.cfs_rq:/.load.stddev
116849 ± 27% -51.2% 56995 ± 20% sched_debug.cfs_rq:/.min_vruntime.max
1223 ±191% -897.4% -9754 sched_debug.cfs_rq:/.spread0.avg
107969 ± 29% -65.3% 37448 ± 39% sched_debug.cfs_rq:/.spread0.max
55209 ± 14% -21.8% 43154 ± 14% sched_debug.cpu.nr_switches.max
11.21 +23.7% 13.87 perf-stat.i.MPKI
7.223e+08 -4.4% 6.907e+08 perf-stat.i.branch-instructions
2.67 +0.2 2.88 perf-stat.i.branch-miss-rate%
19988363 +2.8% 20539702 perf-stat.i.branch-misses
17.36 -2.8 14.59 perf-stat.i.cache-miss-rate%
40733859 +19.5% 48659982 perf-stat.i.cache-references
1.76 +3.5% 1.82 perf-stat.i.cpi
55.21 +5.4% 58.21 ± 2% perf-stat.i.cpu-migrations
1.01e+09 -3.8% 9.719e+08 perf-stat.i.dTLB-loads
0.26 ± 4% -0.0 0.23 ± 3% perf-stat.i.dTLB-store-miss-rate%
2166022 ± 4% -6.9% 2015917 ± 3% perf-stat.i.dTLB-store-misses
8.503e+08 +5.5% 8.968e+08 perf-stat.i.dTLB-stores
69.22 ± 4% +6.4 75.60 perf-stat.i.iTLB-load-miss-rate%
316455 ± 12% -31.6% 216531 ± 3% perf-stat.i.iTLB-loads
3.722e+09 -3.1% 3.608e+09 perf-stat.i.instructions
0.57 -3.3% 0.55 perf-stat.i.ipc
865.04 -10.4% 775.02 ± 3% perf-stat.i.metric.K/sec
47.51 -2.1 45.37 perf-stat.i.node-load-miss-rate%
106705 ± 3% +14.8% 122490 ± 5% perf-stat.i.node-loads
107169 ± 4% +29.0% 138208 ± 7% perf-stat.i.node-stores
10.94 +23.3% 13.49 perf-stat.overall.MPKI
2.77 +0.2 2.97 perf-stat.overall.branch-miss-rate%
17.28 -2.7 14.56 perf-stat.overall.cache-miss-rate%
1.73 +3.4% 1.79 perf-stat.overall.cpi
0.25 ± 4% -0.0 0.22 ± 3% perf-stat.overall.dTLB-store-miss-rate%
69.20 ± 4% +6.4 75.60 perf-stat.overall.iTLB-load-miss-rate%
0.58 -3.2% 0.56 perf-stat.overall.ipc
45.25 -2.2 43.10 perf-stat.overall.node-load-miss-rate%
7.199e+08 -4.4% 6.883e+08 perf-stat.ps.branch-instructions
19919808 +2.8% 20469001 perf-stat.ps.branch-misses
40597326 +19.5% 48497201 perf-stat.ps.cache-references
55.06 +5.4% 58.03 ± 2% perf-stat.ps.cpu-migrations
1.007e+09 -3.8% 9.686e+08 perf-stat.ps.dTLB-loads
2158768 ± 4% -6.9% 2009174 ± 3% perf-stat.ps.dTLB-store-misses
8.475e+08 +5.5% 8.937e+08 perf-stat.ps.dTLB-stores
315394 ± 12% -31.6% 215816 ± 3% perf-stat.ps.iTLB-loads
3.71e+09 -3.1% 3.595e+09 perf-stat.ps.instructions
106351 ± 3% +14.8% 122083 ± 5% perf-stat.ps.node-loads
106728 ± 4% +29.1% 137740 ± 7% perf-stat.ps.node-stores
1.117e+12 -3.0% 1.084e+12 perf-stat.total.instructions
0.00 +0.8 0.75 ± 12% perf-profile.calltrace.cycles-pp.__call_rcu_common.xas_store.__xa_erase.xa_erase.simple_offset_remove
0.00 +0.8 0.78 ± 34% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store
0.00 +0.8 0.83 ± 29% perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand
0.00 +0.9 0.92 ± 26% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create
0.00 +1.0 0.99 ± 27% perf-profile.calltrace.cycles-pp.shuffle_freelist.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc
0.00 +1.0 1.04 ± 28% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store.__xa_alloc
0.00 +1.1 1.11 ± 26% perf-profile.calltrace.cycles-pp.xas_alloc.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic
1.51 ± 24% +1.2 2.73 ± 10% perf-profile.calltrace.cycles-pp.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.2 1.24 ± 20% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create.xas_store
0.00 +1.3 1.27 ± 10% perf-profile.calltrace.cycles-pp.xas_store.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink
0.00 +1.3 1.30 ± 10% perf-profile.calltrace.cycles-pp.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink
0.00 +1.3 1.33 ± 19% perf-profile.calltrace.cycles-pp.xas_alloc.xas_expand.xas_create.xas_store.__xa_alloc
0.00 +1.4 1.36 ± 10% perf-profile.calltrace.cycles-pp.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat
0.00 +1.4 1.37 ± 10% perf-profile.calltrace.cycles-pp.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink
0.00 +1.5 1.51 ± 17% perf-profile.calltrace.cycles-pp.xas_expand.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic
0.00 +1.6 1.62 ± 12% perf-profile.calltrace.cycles-pp.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64
0.00 +2.8 2.80 ± 13% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add
0.00 +2.9 2.94 ± 13% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod
5.38 ± 24% +3.1 8.51 ± 11% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
6.08 ± 24% +3.2 9.24 ± 12% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat
0.00 +3.2 3.20 ± 13% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open
0.00 +3.2 3.24 ± 13% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups
0.00 +3.4 3.36 ± 14% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat
2.78 ± 25% +3.4 6.17 ± 12% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
0.16 ± 30% -0.1 0.08 ± 20% perf-profile.children.cycles-pp.map_id_up
0.02 ±146% +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_is_huge
0.02 ±141% +0.1 0.09 ± 16% perf-profile.children.cycles-pp.__list_del_entry_valid
0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.free_unref_page
0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_destroy_inode
0.04 ±101% +0.1 0.14 ± 25% perf-profile.children.cycles-pp.rcu_nocb_try_bypass
0.00 +0.1 0.12 ± 27% perf-profile.children.cycles-pp.xas_find_marked
0.02 ±144% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.__unfreeze_partials
0.03 ±106% +0.2 0.19 ± 26% perf-profile.children.cycles-pp.xas_descend
0.01 ±223% +0.2 0.17 ± 15% perf-profile.children.cycles-pp.get_page_from_freelist
0.11 ± 22% +0.2 0.29 ± 16% perf-profile.children.cycles-pp.rcu_segcblist_enqueue
0.02 ±146% +0.2 0.24 ± 13% perf-profile.children.cycles-pp.__alloc_pages
0.36 ± 79% +0.6 0.98 ± 15% perf-profile.children.cycles-pp.__slab_free
0.50 ± 26% +0.7 1.23 ± 14% perf-profile.children.cycles-pp.__call_rcu_common
0.00 +0.8 0.82 ± 13% perf-profile.children.cycles-pp.radix_tree_node_rcu_free
0.00 +1.1 1.14 ± 17% perf-profile.children.cycles-pp.radix_tree_node_ctor
0.16 ± 86% +1.2 1.38 ± 16% perf-profile.children.cycles-pp.setup_object
1.52 ± 25% +1.2 2.75 ± 10% perf-profile.children.cycles-pp.vfs_unlink
0.36 ± 22% +1.3 1.63 ± 12% perf-profile.children.cycles-pp.shmem_unlink
0.00 +1.3 1.30 ± 10% perf-profile.children.cycles-pp.__xa_erase
0.20 ± 79% +1.3 1.53 ± 15% perf-profile.children.cycles-pp.shuffle_freelist
0.00 +1.4 1.36 ± 10% perf-profile.children.cycles-pp.xa_erase
0.00 +1.4 1.38 ± 10% perf-profile.children.cycles-pp.simple_offset_remove
0.00 +1.5 1.51 ± 17% perf-profile.children.cycles-pp.xas_expand
0.26 ± 78% +1.6 1.87 ± 13% perf-profile.children.cycles-pp.allocate_slab
0.40 ± 49% +1.7 2.10 ± 13% perf-profile.children.cycles-pp.___slab_alloc
1.30 ± 85% +2.1 3.42 ± 12% perf-profile.children.cycles-pp.rcu_do_batch
1.56 ± 27% +2.4 3.93 ± 11% perf-profile.children.cycles-pp.kmem_cache_alloc_lru
0.00 +2.4 2.44 ± 12% perf-profile.children.cycles-pp.xas_alloc
2.66 ± 13% +2.5 5.14 ± 5% perf-profile.children.cycles-pp.__irq_exit_rcu
11.16 ± 10% +2.7 13.88 ± 8% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
11.77 ± 10% +2.7 14.49 ± 8% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.00 +2.8 2.82 ± 13% perf-profile.children.cycles-pp.xas_create
5.40 ± 24% +3.1 8.52 ± 11% perf-profile.children.cycles-pp.lookup_open
6.12 ± 24% +3.1 9.27 ± 12% perf-profile.children.cycles-pp.open_last_lookups
0.00 +3.2 3.22 ± 13% perf-profile.children.cycles-pp.__xa_alloc
0.00 +3.2 3.24 ± 13% perf-profile.children.cycles-pp.__xa_alloc_cyclic
0.00 +3.4 3.36 ± 14% perf-profile.children.cycles-pp.simple_offset_add
2.78 ± 25% +3.4 6.18 ± 12% perf-profile.children.cycles-pp.shmem_mknod
0.00 +4.2 4.24 ± 12% perf-profile.children.cycles-pp.xas_store
0.14 ± 27% -0.1 0.08 ± 21% perf-profile.self.cycles-pp.map_id_up
0.00 +0.1 0.06 ± 24% perf-profile.self.cycles-pp.shmem_destroy_inode
0.00 +0.1 0.07 ± 8% perf-profile.self.cycles-pp.__xa_alloc
0.02 ±146% +0.1 0.11 ± 28% perf-profile.self.cycles-pp.rcu_nocb_try_bypass
0.01 ±223% +0.1 0.10 ± 28% perf-profile.self.cycles-pp.shuffle_freelist
0.00 +0.1 0.11 ± 40% perf-profile.self.cycles-pp.xas_create
0.00 +0.1 0.12 ± 27% perf-profile.self.cycles-pp.xas_find_marked
0.00 +0.1 0.14 ± 18% perf-profile.self.cycles-pp.xas_alloc
0.03 ±103% +0.1 0.17 ± 29% perf-profile.self.cycles-pp.xas_descend
0.00 +0.2 0.16 ± 23% perf-profile.self.cycles-pp.xas_expand
0.10 ± 22% +0.2 0.27 ± 16% perf-profile.self.cycles-pp.rcu_segcblist_enqueue
0.00 +0.4 0.36 ± 16% perf-profile.self.cycles-pp.xas_store
0.32 ± 30% +0.4 0.71 ± 12% perf-profile.self.cycles-pp.__call_rcu_common
0.18 ± 27% +0.5 0.65 ± 8% perf-profile.self.cycles-pp.kmem_cache_alloc_lru
0.36 ± 79% +0.6 0.96 ± 15% perf-profile.self.cycles-pp.__slab_free
0.00 +0.8 0.80 ± 14% perf-profile.self.cycles-pp.radix_tree_node_rcu_free
0.00 +1.0 1.01 ± 16% perf-profile.self.cycles-pp.radix_tree_node_ctor


***************************************************************************************************
lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/all/aim9/5s

commit:
23a31d8764 ("shmem: Refactor shmem_symlink()")
a2e459555c ("shmem: stable directory offsets")

23a31d87645c6527 a2e459555c5f9da3e619b7e47a6
---------------- ---------------------------
%stddev %change %stddev
\ | \
9781285 +2.0% 9975309 proc-vmstat.pgalloc_normal
4481052 -1.6% 4408359 proc-vmstat.pgfault
9749965 +2.0% 9942285 proc-vmstat.pgfree
14556 -1.6% 14324 perf-stat.i.minor-faults
14556 -1.6% 14324 perf-stat.i.page-faults
14505 -1.6% 14272 perf-stat.ps.minor-faults
14505 -1.6% 14272 perf-stat.ps.page-faults
849714 -3.6% 819341 aim9.brk_test.ops_per_sec
478138 +3.1% 492806 aim9.dgram_pipe.ops_per_sec
199087 -14.6% 170071 aim9.disk_src.ops_per_sec
286595 -9.7% 258794 aim9.link_test.ops_per_sec
303603 -2.8% 295009 aim9.page_test.ops_per_sec
3692190 -1.7% 3629732 aim9.time.minor_page_faults
0.00 +1.0 0.95 ± 25% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add
0.00 +1.0 1.01 ± 23% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod
1.54 ± 22% +1.1 2.61 ± 22% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
0.00 +1.2 1.15 ± 21% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open
0.00 +1.2 1.18 ± 21% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups
0.00 +1.2 1.22 ± 21% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat
0.28 ± 21% +0.2 0.45 ± 24% perf-profile.children.cycles-pp.__call_rcu_common
0.00 +0.3 0.26 ± 43% perf-profile.children.cycles-pp.radix_tree_node_rcu_free
0.14 ± 46% +0.3 0.45 ± 20% perf-profile.children.cycles-pp.setup_object
0.00 +0.3 0.33 ± 24% perf-profile.children.cycles-pp.radix_tree_node_ctor
0.16 ± 49% +0.4 0.52 ± 24% perf-profile.children.cycles-pp.shuffle_freelist
0.23 ± 43% +0.4 0.63 ± 23% perf-profile.children.cycles-pp.allocate_slab
0.30 ± 35% +0.4 0.74 ± 24% perf-profile.children.cycles-pp.___slab_alloc
0.17 ± 25% +0.5 0.66 ± 23% perf-profile.children.cycles-pp.shmem_unlink
0.00 +0.5 0.49 ± 24% perf-profile.children.cycles-pp.__xa_erase
0.00 +0.5 0.52 ± 24% perf-profile.children.cycles-pp.xa_erase
0.00 +0.5 0.52 ± 64% perf-profile.children.cycles-pp.xas_expand
0.00 +0.5 0.53 ± 24% perf-profile.children.cycles-pp.simple_offset_remove
0.87 ± 26% +0.7 1.56 ± 23% perf-profile.children.cycles-pp.kmem_cache_alloc_lru
2.44 ± 12% +0.8 3.25 ± 13% perf-profile.children.cycles-pp.__irq_exit_rcu
0.00 +0.8 0.82 ± 24% perf-profile.children.cycles-pp.xas_alloc
0.01 ±230% +1.0 0.99 ± 23% perf-profile.children.cycles-pp.xas_create
1.55 ± 22% +1.1 2.63 ± 22% perf-profile.children.cycles-pp.shmem_mknod
0.00 +1.2 1.16 ± 21% perf-profile.children.cycles-pp.__xa_alloc
0.00 +1.2 1.18 ± 21% perf-profile.children.cycles-pp.__xa_alloc_cyclic
0.00 +1.2 1.22 ± 21% perf-profile.children.cycles-pp.simple_offset_add
0.18 ± 28% +1.5 1.65 ± 21% perf-profile.children.cycles-pp.xas_store
0.11 ± 31% +0.1 0.25 ± 27% perf-profile.self.cycles-pp.xas_store
0.11 ± 31% +0.2 0.28 ± 24% perf-profile.self.cycles-pp.kmem_cache_alloc_lru
0.00 +0.3 0.26 ± 44% perf-profile.self.cycles-pp.radix_tree_node_rcu_free
0.00 +0.3 0.29 ± 23% perf-profile.self.cycles-pp.radix_tree_node_ctor



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki