[peterz-queue:locking/futex] [futex] e1a4bd5d6d: will-it-scale.per_thread_ops -11.2% regression

From: kernel test robot
Date: Tue Dec 05 2023 - 09:57:55 EST




Hello,

kernel test robot noticed a -11.2% regression of will-it-scale.per_thread_ops on:


commit: e1a4bd5d6d978ba147f823c669373e3596e0bbcc ("futex: Implement FUTEX2_NUMA")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git locking/futex

testcase: will-it-scale
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

nr_task: 16
mode: thread
test: futex1
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202312052213.d20bec0a-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231205/202312052213.d20bec0a-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/futex1/will-it-scale

commit:
38d12f1c15 ("mm: Add vmalloc_huge_node()")
e1a4bd5d6d ("futex: Implement FUTEX2_NUMA")

38d12f1c15069458 e1a4bd5d6d978ba147f823c6693
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.29 -0.1 1.16 mpstat.cpu.all.usr%
16082 ± 47% +268.8% 59317 ± 46% numa-meminfo.node3.AnonHugePages
443502 ± 10% -21.7% 347354 ± 16% numa-numastat.node3.numa_hit
443856 ± 10% -21.7% 347355 ± 16% numa-vmstat.node3.numa_hit
1821 ± 30% -45.9% 985.13 ± 52% sched_debug.cfs_rq:/.load_avg.stddev
9224874 ± 5% +54.4% 14242474 ± 5% meminfo.DirectMap2M
163286 ± 5% +30.9% 213804 ± 5% meminfo.DirectMap4k
0.55 ± 7% -14.3% 0.47 turbostat.IPC
72.33 +1.8% 73.67 turbostat.PkgTmp
1.155e+08 -11.2% 1.026e+08 will-it-scale.16.threads
7220531 -11.2% 6414312 will-it-scale.per_thread_ops
1.155e+08 -11.2% 1.026e+08 will-it-scale.workload
2.035e+10 -8.9% 1.853e+10 perf-stat.i.branch-instructions
0.31 -0.0 0.30 perf-stat.i.branch-miss-rate%
62615280 -12.4% 54851709 perf-stat.i.branch-misses
0.54 +9.3% 0.59 perf-stat.i.cpi
0.00 ± 5% +0.0 0.00 ± 2% perf-stat.i.dTLB-load-miss-rate%
139076 ± 5% +104.7% 284748 ± 2% perf-stat.i.dTLB-load-misses
2.634e+10 -8.2% 2.418e+10 perf-stat.i.dTLB-loads
1.927e+10 -8.8% 1.756e+10 perf-stat.i.dTLB-stores
55538465 -10.4% 49774500 ± 4% perf-stat.i.iTLB-load-misses
2514504 -10.7% 2245869 perf-stat.i.iTLB-loads
1.25e+11 -8.1% 1.149e+11 perf-stat.i.instructions
1.85 -8.5% 1.69 perf-stat.i.ipc
294.40 -8.6% 268.98 perf-stat.i.metric.M/sec
0.31 -0.0 0.30 perf-stat.overall.branch-miss-rate%
0.54 +9.3% 0.59 perf-stat.overall.cpi
0.00 ± 5% +0.0 0.00 ± 2% perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 6% +0.0 0.00 ± 5% perf-stat.overall.dTLB-store-miss-rate%
1.85 -8.5% 1.69 perf-stat.overall.ipc
325727 +3.2% 336234 perf-stat.overall.path-length
2.028e+10 -8.9% 1.847e+10 perf-stat.ps.branch-instructions
62436489 -12.4% 54701854 perf-stat.ps.branch-misses
138701 ± 5% +104.7% 283927 ± 2% perf-stat.ps.dTLB-load-misses
2.625e+10 -8.2% 2.409e+10 perf-stat.ps.dTLB-loads
1.92e+10 -8.8% 1.75e+10 perf-stat.ps.dTLB-stores
55348676 -10.4% 49598644 ± 4% perf-stat.ps.iTLB-load-misses
2506036 -10.7% 2238080 perf-stat.ps.iTLB-loads
1.246e+11 -8.1% 1.145e+11 perf-stat.ps.instructions
3.763e+13 -8.3% 3.451e+13 perf-stat.total.instructions
14.56 ± 2% -1.5 13.06 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
27.62 ± 2% -1.4 26.24 perf-profile.calltrace.cycles-pp.get_user_pages_fast.get_futex_key.futex_wake.do_futex.__x64_sys_futex
25.52 ± 2% -1.0 24.52 perf-profile.calltrace.cycles-pp.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key.futex_wake.do_futex
11.08 ± 2% -0.6 10.48 ± 2% perf-profile.calltrace.cycles-pp.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast
3.74 ± 2% -0.5 3.26 ± 3% perf-profile.calltrace.cycles-pp.try_grab_folio.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast
1.04 ± 3% -0.3 0.77 ± 3% perf-profile.calltrace.cycles-pp.is_valid_gup_args.get_user_pages_fast.get_futex_key.futex_wake.do_futex
2.05 ± 4% -0.2 1.90 ± 2% perf-profile.calltrace.cycles-pp.testcase
1.64 ± 3% -0.1 1.51 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
1.33 ± 3% -0.1 1.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
0.98 ± 3% -0.1 0.87 ± 2% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
1.02 ± 3% -0.1 0.91 ± 2% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
0.69 ± 2% -0.1 0.63 ± 3% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
3.88 ± 5% +0.6 4.44 ± 5% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
4.71 ± 6% +0.6 5.31 ± 5% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
47.98 ± 2% +2.7 50.66 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
43.62 ± 2% +3.1 46.74 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
2.64 ± 3% +3.2 5.87 perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
41.53 ± 2% +3.3 44.86 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
40.19 ± 2% +3.5 43.64 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
27.78 ± 2% -1.4 26.37 perf-profile.children.cycles-pp.get_user_pages_fast
25.77 ± 2% -1.0 24.73 perf-profile.children.cycles-pp.internal_get_user_pages_fast
9.17 ± 2% -0.9 8.28 perf-profile.children.cycles-pp.entry_SYSCALL_64
11.42 ± 2% -0.7 10.77 ± 2% perf-profile.children.cycles-pp.gup_pte_range
5.61 ± 3% -0.6 5.06 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
4.30 ± 2% -0.5 3.85 perf-profile.children.cycles-pp.try_grab_folio
1.11 ± 3% -0.3 0.80 ± 3% perf-profile.children.cycles-pp.is_valid_gup_args
2.09 ± 4% -0.2 1.91 perf-profile.children.cycles-pp.testcase
2.05 ± 3% -0.2 1.88 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.42 ± 3% -0.1 1.29 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.12 ± 3% -0.1 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.02 ± 3% -0.1 0.91 ± 2% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.69 ± 2% -0.1 0.63 ± 3% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.18 ± 9% -0.0 0.13 ± 7% perf-profile.children.cycles-pp.syscall@plt
0.39 ± 5% -0.0 0.35 ± 3% perf-profile.children.cycles-pp.folio_fast_pin_allowed
0.08 ± 12% +0.0 0.12 ± 12% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.06 ± 17% +0.0 0.10 ± 14% perf-profile.children.cycles-pp.rcu_pending
0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.update_rq_clock
0.00 +0.1 0.06 ± 17% perf-profile.children.cycles-pp.check_cpu_stall
0.04 ± 45% +0.1 0.12 ± 6% perf-profile.children.cycles-pp.sched_clock_cpu
0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.hrtimer_forward
1.02 ± 7% +0.3 1.29 ± 6% perf-profile.children.cycles-pp.ktime_get
48.20 ± 2% +2.7 50.86 perf-profile.children.cycles-pp.do_syscall_64
43.65 ± 2% +3.1 46.74 perf-profile.children.cycles-pp.__x64_sys_futex
2.65 ± 3% +3.2 5.88 perf-profile.children.cycles-pp.futex_hash
41.68 ± 2% +3.3 44.99 perf-profile.children.cycles-pp.do_futex
40.38 ± 2% +3.4 43.81 perf-profile.children.cycles-pp.futex_wake
7.80 ± 3% -0.8 6.98 perf-profile.self.cycles-pp.syscall
5.48 ± 3% -0.5 4.94 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
4.24 ± 2% -0.5 3.71 perf-profile.self.cycles-pp.futex_wake
4.28 ± 2% -0.5 3.80 perf-profile.self.cycles-pp.try_grab_folio
2.60 ± 2% -0.3 2.28 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.01 ± 2% -0.3 0.70 ± 2% perf-profile.self.cycles-pp.is_valid_gup_args
3.79 ± 2% -0.3 3.50 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.96 ± 3% -0.2 1.74 ± 3% perf-profile.self.cycles-pp.internal_get_user_pages_fast
1.83 ± 4% -0.2 1.63 perf-profile.self.cycles-pp.__x64_sys_futex
1.79 ± 4% -0.2 1.64 perf-profile.self.cycles-pp.testcase
1.44 ± 2% -0.1 1.29 ± 2% perf-profile.self.cycles-pp.do_futex
1.40 ± 3% -0.1 1.25 perf-profile.self.cycles-pp.do_syscall_64
1.42 ± 3% -0.1 1.29 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
1.12 ± 3% -0.1 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.96 ± 3% -0.1 0.86 ± 2% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.96 ± 3% -0.1 0.87 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.27 ± 6% -0.0 0.24 ± 3% perf-profile.self.cycles-pp.folio_fast_pin_allowed
0.00 +0.1 0.06 ± 17% perf-profile.self.cycles-pp.check_cpu_stall
0.00 +0.1 0.08 ± 12% perf-profile.self.cycles-pp.sched_clock_cpu
0.00 +0.1 0.08 ± 11% perf-profile.self.cycles-pp.hrtimer_forward
0.97 ± 8% +0.3 1.24 ± 6% perf-profile.self.cycles-pp.ktime_get
5.72 ± 3% +2.1 7.83 perf-profile.self.cycles-pp.get_futex_key
2.51 ± 3% +3.2 5.73 perf-profile.self.cycles-pp.futex_hash




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki