Re: [lkp-robot] [x86/entry/64] 63e02a2a32: will-it-scale.per_process_ops -13.0% regression

From: Andy Lutomirski
Date: Sun Dec 03 2017 - 23:00:07 EST


Thomas, has my fix for this landed?

--Andy

> On Dec 3, 2017, at 7:02 PM, kernel test robot <xiaolong.ye@xxxxxxxxx> wrote:
>
>
> Greeting,
>
> FYI, we noticed a -13.0% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: 63e02a2a3292d8815eac7be438c8c73d72a7bb93 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: will-it-scale
> on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
> with following parameters:
>
> test: poll1
> cpufreq_governor: performance
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -7.0% regression |
> | test machine | 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=writeseek1 |
> +------------------+---------------------------------------------------------------------+
> | testcase: change | aim9: aim9.brk_test.ops_per_sec -9.9% regression |
> | test machine | 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=brk_test |
> | | testtime=300s |
> +------------------+---------------------------------------------------------------------+
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
> gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/poll1/will-it-scale
>
> commit:
> 955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
> 63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
>
> 955cef1517a1be93 63e02a2a3292d8815eac7be438
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 7435674 -13.0% 6465918 will-it-scale.per_process_ops
> 5868564 -10.4% 5256868 will-it-scale.per_thread_ops
> 0.56 +8.0% 0.61 Â 2% will-it-scale.scalability
> 1947 -2.0% 1908 will-it-scale.time.system_time
> 562.79 +6.9% 601.69 will-it-scale.time.user_time
> 8.06 +0.8 8.86 Â 3% mpstat.cpu.usr%
> 4969 Â 83% -84.5% 769.00 Â 6% numa-meminfo.node1.Inactive(anon)
> 116.75 Â 63% +90.1% 222.00 Â 9% numa-vmstat.node0.nr_mlock
> 116.75 Â 63% +90.1% 222.00 Â 9% numa-vmstat.node0.nr_unevictable
> 116.75 Â 63% +90.1% 222.00 Â 9% numa-vmstat.node0.nr_zone_unevictable
> 1242 Â 83% -84.6% 191.25 Â 6% numa-vmstat.node1.nr_inactive_anon
> 1242 Â 83% -84.6% 191.25 Â 6% numa-vmstat.node1.nr_zone_inactive_anon
> 1414780 +7.7% 1524182 Â 3% sched_debug.cfs_rq:/.min_vruntime.max
> 144.71 Â 12% +17.8% 170.42 Â 2% sched_debug.cfs_rq:/.runnable_load_avg.max
> -568616 -29.5% -400842 sched_debug.cfs_rq:/.spread0.min
> 202980 Â 13% +56.8% 318219 Â 6% sched_debug.cpu.avg_idle.min
> 173545 Â 3% -13.9% 149414 Â 5% sched_debug.cpu.avg_idle.stddev
> 2.906e+12 -7.9% 2.676e+12 perf-stat.branch-instructions
> 0.01 Â 2% +2.0 2.00 perf-stat.branch-miss-rate%
> 2.405e+08 +22170.9% 5.356e+10 perf-stat.branch-misses
> 1.15 +11.6% 1.28 perf-stat.cpi
> 3.659e+12 -9.3% 3.318e+12 perf-stat.dTLB-loads
> 0.00 Â 6% +0.0 0.00 Â 3% perf-stat.dTLB-store-miss-rate%
> 2.869e+12 -8.8% 2.616e+12 perf-stat.dTLB-stores
> 1.406e+13 -9.7% 1.27e+13 perf-stat.instructions
> 0.87 -10.4% 0.78 perf-stat.ipc
> 13.72 Â 2% -13.7 0.00 perf-profile.calltrace.cycles.entry_SYSCALL_64
> 24.53 Â 2% -0.2 24.30 Â 3% perf-profile.calltrace.cycles.copy_user_generic_string._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 12.15 Â 3% -0.2 11.98 Â 3% perf-profile.calltrace.cycles.__fget_light.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 9.57 Â 3% -0.1 9.48 Â 4% perf-profile.calltrace.cycles.__fget.__fget_light.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 5.79 Â 6% -0.0 5.75 Â 3% perf-profile.calltrace.cycles.fput.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 32.25 Â 2% +1.5 33.78 Â 3% perf-profile.calltrace.cycles._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 3.99 Â 5% +1.6 5.56 Â 3% perf-profile.calltrace.cycles.__might_fault._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 65.36 Â 2% +2.0 67.34 Â 2% perf-profile.calltrace.cycles.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 68.87 Â 2% +3.1 72.01 Â 2% perf-profile.calltrace.cycles.sys_poll.entry_SYSCALL_64_fastpath
> 7.33 Â 35% +3.7 11.05 Â 23% perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 71.48 Â 2% +3.9 75.41 Â 2% perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
> 9.50 Â 25% +4.0 13.49 Â 19% perf-profile.calltrace.cycles.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 10.06 Â 23% +4.0 14.05 Â 18% perf-profile.calltrace.cycles.secondary_startup_64
> 9.66 Â 24% +4.0 13.66 Â 19% perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.secondary_startup_64
> 9.66 Â 24% +4.0 13.66 Â 19% perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 9.66 Â 24% +4.0 13.66 Â 19% perf-profile.calltrace.cycles.start_secondary.secondary_startup_64
> 2.25 Â 3% +5.4 7.67 Â 3% perf-profile.calltrace.cycles.entry_SYSCALL_64_after_hwframe
> 13.72 Â 2% -13.7 0.00 perf-profile.children.cycles.entry_SYSCALL_64
> 24.53 Â 2% -0.2 24.31 Â 3% perf-profile.children.cycles.copy_user_generic_string
> 12.16 Â 3% -0.2 11.99 Â 3% perf-profile.children.cycles.__fget_light
> 9.57 Â 3% -0.1 9.48 Â 4% perf-profile.children.cycles.__fget
> 5.79 Â 6% -0.0 5.75 Â 3% perf-profile.children.cycles.fput
> 32.25 Â 2% +1.5 33.78 Â 3% perf-profile.children.cycles._copy_from_user
> 3.99 Â 5% +1.6 5.56 Â 3% perf-profile.children.cycles.__might_fault
> 65.36 Â 2% +2.0 67.34 Â 2% perf-profile.children.cycles.do_sys_poll
> 68.87 Â 2% +3.1 72.01 Â 2% perf-profile.children.cycles.sys_poll
> 7.42 Â 34% +3.7 11.14 Â 22% perf-profile.children.cycles.poll_idle
> 71.61 Â 2% +3.9 75.50 Â 2% perf-profile.children.cycles.entry_SYSCALL_64_fastpath
> 9.88 Â 23% +4.0 13.87 Â 19% perf-profile.children.cycles.cpuidle_enter_state
> 10.06 Â 23% +4.0 14.05 Â 18% perf-profile.children.cycles.secondary_startup_64
> 10.06 Â 23% +4.0 14.05 Â 18% perf-profile.children.cycles.cpu_startup_entry
> 9.66 Â 24% +4.0 13.66 Â 19% perf-profile.children.cycles.start_secondary
> 10.06 Â 23% +4.0 14.05 Â 18% perf-profile.children.cycles.do_idle
> 2.25 Â 3% +5.4 7.67 Â 3% perf-profile.children.cycles.entry_SYSCALL_64_after_hwframe
> 13.72 Â 2% -13.7 0.00 perf-profile.self.cycles.entry_SYSCALL_64
> 24.21 Â 2% -0.3 23.93 Â 2% perf-profile.self.cycles.copy_user_generic_string
> 9.47 Â 3% -0.1 9.41 Â 4% perf-profile.self.cycles.__fget
> 5.69 Â 5% +0.0 5.71 Â 3% perf-profile.self.cycles.fput
> 13.55 Â 4% +0.7 14.24 perf-profile.self.cycles.do_sys_poll
> 7.41 Â 34% +3.7 11.07 Â 22% perf-profile.self.cycles.poll_idle
> 2.25 Â 3% +5.4 7.67 Â 3% perf-profile.self.cycles.entry_SYSCALL_64_after_hwframe
>
>
>
> will-it-scale.per_process_ops
>
> 7.8e+06 +-+---------------------------------------------------------------+
> |. .+.++ .++. |
> 7.6e+06 +-+ : .+.+ +.+.+.+ +.+ |
> | : .+.+ + + + |
> 7.4e+06 +-+ +.+.+.+.++.+.+.+.+.++ ++.+.+ ++.+.|
> | |
> 7.2e+06 +-+ |
> | |
> 7e+06 +-+ |
> | |
> 6.8e+06 +-+ |
> | |
> 6.6e+06 O-+ O OO OO O O |
> | O O O O OO O O O O OO O O O O O |
> 6.4e+06 +-+--------O-----------------------O-O-------------O--------------+
>
>
> perf-stat.instructions
>
> 1.5e+13 +-+--------------------------------------------------------------+
> | |
> 1.45e+13 +-+ +.+ .+. |
> | +.+ + +.+.+.+. .+.+.+. +. .+.++.+ +. |
> | +. : +.++ + +.+ ++.+.|
> 1.4e+13 +-+ +.++.+.+.+.+ |
> | |
> 1.35e+13 +-+ |
> | |
> 1.3e+13 +-+ |
> O OO O O OO O O O O O |
> | O O O O OO O O O O O O O O O |
> 1.25e+13 +-+ O O |
> | |
> 1.2e+13 +-+--------------------------------------------------------------+
>
>
> perf-stat.branch-instructions
>
> 3.05e+12 +-+--------------------------------------------------------------+
> 3e+12 +-+ + |
> |.+.++.+ + ++ .+.+ .+. + + + |
> 2.95e+12 +-+ + + + +.+. .+. + +. + + .+ + + + + + : +|
> 2.9e+12 +-+ + + + + + + + + + + + :+ + : |
> | + + + + ++ |
> 2.85e+12 +-+ |
> 2.8e+12 +-+ |
> 2.75e+12 +-+ |
> | O |
> 2.7e+12 +-+ O O O O O |
> 2.65e+12 O-+ O O O O O O O O O O O O |
> | O O O O O O O O O |
> 2.6e+12 +-+ O |
> 2.55e+12 +-+--------------------------------------------------------------+
>
>
> perf-stat.branch-misses
>
> 6e+10 +-+-----------------------------------------------------------------+
> | O O O O O O O |
> 5e+10 O-O O O O O O O O O OO O O O O O O OO O O |
> | |
> | |
> 4e+10 +-+ |
> | |
> 3e+10 +-+ |
> | |
> 2e+10 +-+ |
> | |
> | |
> 1e+10 +-+ |
> | |
> 0 +-+-----------------------------------------------------------------+
>
>
> perf-stat.dTLB-stores
>
> 3.2e+12 +-+---------------------------------------------------------------+
> | + + + + |
> 3.1e+12 +-+ + + : :+ +: |
> | + + : + + : |
> 3e+12 +-+ : : : : |
> |. : : : : + |
> 2.9e+12 +-+.+.++. : : +.+ .+. : +. .+ : +|
> | +.+. .+.++.+.: +. + :.+ +.: + :: |
> 2.8e+12 +-+ + + +.+ + + + |
> | |
> 2.7e+12 +-+ |
> O OO O O O O |
> 2.6e+12 +-O O O O O O O O OO O O OO |
> | O O O O O O O O |
> 2.5e+12 +-+---------------------------------------------------------------+
>
>
> perf-stat.branch-miss-rate_
>
> 2.5 +-+-------------------------------------------------------------------+
> | |
> | |
> 2 O-O O O O O O O O OO O O O O O O O O O O O O O O O O OO |
> | |
> | |
> 1.5 +-+ |
> | |
> 1 +-+ |
> | |
> | |
> 0.5 +-+ |
> | |
> | |
> 0 +-+-------------------------------------------------------------------+
>
>
> perf-stat.ipc
>
> 0.92 +-+------------------------------------------------------------------+
> | |
> 0.9 +-+.+. +. .+. .+. +. .+. |
> 0.88 +-+ +. + + +. +.+ +. .+. + + + .+. |
> | +. +. .+ +.+ + +.+ + +. .+.|
> 0.86 +-+ +.+ +.+.+.+ + |
> | |
> 0.84 +-+ |
> | |
> 0.82 +-+ |
> 0.8 +-+ O O O O |
> | O O O O |
> 0.78 +-O O O O O O O O O O O O O O |
> O O O O O O O |
> 0.76 +-+------------------------------------------------------------------+
>
>
> perf-stat.cpi
>
> 1.3 +-+---------------------------------O-O------------------------------+
> 1.28 O-+ O O O O O O O O O |
> | O O O O O O O O O O O O |
> 1.26 +-+ O |
> 1.24 +-+ O O O O |
> | |
> 1.22 +-+ |
> 1.2 +-+ |
> 1.18 +-+ |
> | |
> 1.16 +-+ .+.+ .+.+.+.+. .+ .+. |
> 1.14 +-+ .+ + + .+ +. .+. .+.+ +. .+ +.|
> |.+. .+ + .+. .+ +. .+ + + .+. .+ + |
> 1.12 +-+ + + + + + + |
> 1.1 +-+------------------------------------------------------------------+
>
>
> will-it-scale.time.user_time
>
> 620 +-+-------------------------------------------------------------------+
> 610 +-+ O O |
> O O O O O O O OO O O O O O O |
> 600 +-+ O O O O O O O O O O O O |
> 590 +-+ |
> | |
> 580 +-+ |
> 570 +-+ |
> 560 +-+ +.+.+.|
> | : |
> 550 +-+.+.+.+. .+ .+.+. : |
> 540 +-+ +.+. + + .+.+ +.+ +. : |
> | +.+.++.+.+. + +.+ + + + |
> 530 +-+ +.+.+.+ ++.+.+ |
> 520 +-+-------------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> ***************************************************************************************************
> lkp-sb03: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
> gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/writeseek1/will-it-scale
>
> commit:
> 955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
> 63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
>
> 955cef1517a1be93 63e02a2a3292d8815eac7be438
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 1902014 -7.0% 1768039 will-it-scale.per_process_ops
> 1557647 -6.3% 1459046 will-it-scale.per_thread_ops
> 0.52 +4.0% 0.54 will-it-scale.scalability
> 2293 -1.8% 2251 will-it-scale.time.system_time
> 216.11 +19.7% 258.70 will-it-scale.time.user_time
> 1.453e+08 Â 6% +21.7% 1.769e+08 Â 9% cpuidle.POLL.time
> 3.43 +0.8 4.26 mpstat.cpu.usr%
> 284863 Â 6% +12.9% 321561 Â 3% softirqs.RCU
> 7178 Â 6% -11.3% 6368 slabinfo.kmalloc-96.active_objs
> 7218 Â 5% -10.6% 6450 slabinfo.kmalloc-96.num_objs
> 72.27 Â 6% +19.5% 86.39 Â 7% sched_debug.cfs_rq:/.load_avg.avg
> 107.67 Â 3% +31.1% 141.11 Â 19% sched_debug.cfs_rq:/.load_avg.stddev
> 50035 Â 23% +17.3% 58672 Â 24% sched_debug.cpu.load.stddev
> 7.58 Â 21% +65.4% 12.54 Â 11% sched_debug.cpu.nr_uninterruptible.max
> 3.143e+12 -4.7% 2.995e+12 perf-stat.branch-instructions
> 0.01 Â 2% +1.0 0.97 perf-stat.branch-miss-rate%
> 3.791e+08 Â 3% +7525.5% 2.891e+10 perf-stat.branch-misses
> 2.54e+08 +1.0% 2.566e+08 perf-stat.cache-misses
> 1.03 +6.3% 1.10 perf-stat.cpi
> 6.671e+12 -4.7% 6.361e+12 perf-stat.dTLB-loads
> 4.722e+12 -5.0% 4.485e+12 perf-stat.dTLB-stores
> 35.63 Â 12% -29.7 5.89 Â 20% perf-stat.iTLB-load-miss-rate%
> 8.119e+08 Â 8% +829.8% 7.549e+09 Â 2% perf-stat.iTLB-loads
> 1.563e+13 -5.3% 1.48e+13 perf-stat.instructions
> 0.97 -5.9% 0.91 perf-stat.ipc
> 5.97 -6.0 0.00 perf-profile.calltrace.cycles.entry_SYSCALL_64
> 7.43 Â 2% -0.1 7.29 Â 3% perf-profile.calltrace.cycles.find_lock_entry.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter
> 9.10 Â 2% -0.1 9.00 Â 3% perf-profile.calltrace.cycles.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
> 9.43 Â 2% -0.1 9.33 Â 3% perf-profile.calltrace.cycles.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write
> 19.45 -0.1 19.39 Â 2% perf-profile.calltrace.cycles.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
> 19.14 -0.0 19.10 perf-profile.calltrace.cycles.copy_user_generic_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
> 21.14 +0.0 21.15 Â 2% perf-profile.calltrace.cycles.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write
> 9.16 Â 10% +0.0 9.20 Â 41% perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 41.59 +0.1 41.71 Â 2% perf-profile.calltrace.cycles.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write.vfs_write
> 11.09 Â 8% +0.2 11.24 Â 31% perf-profile.calltrace.cycles.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 11.21 Â 8% +0.2 11.37 Â 31% perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 11.21 Â 8% +0.2 11.37 Â 31% perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.secondary_startup_64
> 11.21 Â 8% +0.2 11.37 Â 31% perf-profile.calltrace.cycles.start_secondary.secondary_startup_64
> 11.68 Â 7% +0.2 11.90 Â 27% perf-profile.calltrace.cycles.secondary_startup_64
> 45.10 +0.3 45.37 Â 2% perf-profile.calltrace.cycles.__generic_file_write_iter.generic_file_write_iter.__vfs_write.vfs_write.sys_write
> 51.69 +0.3 52.02 Â 2% perf-profile.calltrace.cycles.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 50.28 +0.4 50.63 Â 2% perf-profile.calltrace.cycles.generic_file_write_iter.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 61.80 +0.8 62.60 Â 3% perf-profile.calltrace.cycles.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 4.92 +0.9 5.80 Â 5% perf-profile.calltrace.cycles.__fdget_pos.sys_lseek.entry_SYSCALL_64_fastpath
> 4.96 +0.9 5.86 Â 3% perf-profile.calltrace.cycles.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
> 8.74 +1.0 9.75 Â 6% perf-profile.calltrace.cycles.sys_lseek.entry_SYSCALL_64_fastpath
> 69.88 +1.6 71.49 Â 3% perf-profile.calltrace.cycles.sys_write.entry_SYSCALL_64_fastpath
> 80.00 +2.9 82.90 Â 3% perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
> 5.97 -6.0 0.00 perf-profile.children.cycles.entry_SYSCALL_64
> 7.43 Â 2% -0.1 7.29 Â 3% perf-profile.children.cycles.find_lock_entry
> 9.10 Â 2% -0.1 9.00 Â 3% perf-profile.children.cycles.shmem_getpage_gfp
> 9.43 Â 2% -0.1 9.33 Â 3% perf-profile.children.cycles.shmem_write_begin
> 19.45 -0.1 19.39 Â 2% perf-profile.children.cycles.copyin
> 19.14 -0.0 19.11 perf-profile.children.cycles.copy_user_generic_string
> 21.14 +0.0 21.15 Â 2% perf-profile.children.cycles.iov_iter_copy_from_user_atomic
> 9.46 Â 9% +0.1 9.56 Â 36% perf-profile.children.cycles.poll_idle
> 41.60 +0.1 41.72 Â 2% perf-profile.children.cycles.generic_perform_write
> 11.21 Â 8% +0.2 11.37 Â 31% perf-profile.children.cycles.start_secondary
> 11.56 Â 7% +0.2 11.76 Â 27% perf-profile.children.cycles.cpuidle_enter_state
> 11.69 Â 7% +0.2 11.90 Â 27% perf-profile.children.cycles.do_idle
> 11.68 Â 7% +0.2 11.90 Â 27% perf-profile.children.cycles.secondary_startup_64
> 11.68 Â 7% +0.2 11.90 Â 27% perf-profile.children.cycles.cpu_startup_entry
> 45.10 +0.3 45.37 Â 2% perf-profile.children.cycles.__generic_file_write_iter
> 51.72 +0.3 52.03 Â 2% perf-profile.children.cycles.__vfs_write
> 50.28 +0.4 50.63 Â 2% perf-profile.children.cycles.generic_file_write_iter
> 61.84 +0.8 62.62 Â 3% perf-profile.children.cycles.vfs_write
> 8.74 +1.0 9.75 Â 6% perf-profile.children.cycles.sys_lseek
> 3.81 +1.6 5.38 Â 5% perf-profile.children.cycles.__fget_light
> 69.93 +1.6 71.50 Â 3% perf-profile.children.cycles.sys_write
> 9.88 +1.8 11.67 Â 3% perf-profile.children.cycles.__fdget_pos
> 80.23 +2.7 82.94 Â 3% perf-profile.children.cycles.entry_SYSCALL_64_fastpath
> 5.97 -6.0 0.00 perf-profile.self.cycles.entry_SYSCALL_64
> 18.93 -0.1 18.84 Â 2% perf-profile.self.cycles.copy_user_generic_string
> 9.39 Â 8% +0.0 9.42 Â 35% perf-profile.self.cycles.poll_idle
>
>
>
> ***************************************************************************************************
> lkp-ivb-d03: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/brk_test/aim9/300s
>
> commit:
> 955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
> 63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
>
> 955cef1517a1be93 63e02a2a3292d8815eac7be438
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 4124214 -9.9% 3717599 aim9.brk_test.ops_per_sec
> 272.29 -4.9% 259.03 aim9.time.system_time
> 27.71 +47.2% 40.78 aim9.time.user_time
> 12605 Â 9% -27.0% 9203 Â 10% cpuidle.POLL.usage
> 3.24 Â 2% +1.4 4.62 mpstat.cpu.usr%
> 4007 Â 3% -9.2% 3639 Â 4% slabinfo.anon_vma_chain.num_objs
> 9.80 -1.9% 9.61 turbostat.CorWatt
> 30309 -1.3% 29929 vmstat.system.cs
> 18905 -1.1% 18689 vmstat.system.in
> 716.67 Â 11% -22.7% 554.33 Â 6% sched_debug.cfs_rq:/.load_avg.avg
> 1.00 Â 11% -79.2% 0.21 Â173% sched_debug.cfs_rq:/.nr_spread_over.min
> 0.45 Â 55% +70.3% 0.76 Â 19% sched_debug.cfs_rq:/.nr_spread_over.stddev
> 521.82 Â 3% -10.2% 468.57 Â 2% sched_debug.cfs_rq:/.util_avg.avg
> 1.96 Â 7% +34.0% 2.62 Â 9% sched_debug.cpu.nr_running.max
> 0.68 Â 15% +42.9% 0.98 Â 15% sched_debug.cpu.nr_running.stddev
> 0.06 Â 19% +0.9 0.92 perf-stat.branch-miss-rate%
> 3.583e+08 Â 5% +1125.0% 4.389e+09 Â 28% perf-stat.branch-misses
> 9163065 -1.8% 8997254 perf-stat.context-switches
> 0.56 Â 2% +12.8% 0.63 Â 4% perf-stat.cpi
> 0.06 Â132% +0.2 0.23 Â 6% perf-stat.dTLB-load-miss-rate%
> 4.062e+08 Â142% +234.1% 1.357e+09 Â 8% perf-stat.dTLB-load-misses
> 9061724 Â 12% +22.0% 11056158 Â 6% perf-stat.dTLB-store-misses
> 11.72 Â 24% -6.6 5.08 Â 33% perf-stat.iTLB-load-miss-rate%
> 4.4e+08 Â 29% +135.5% 1.036e+09 Â 23% perf-stat.iTLB-loads
> 1.80 Â 2% -11.2% 1.60 Â 3% perf-stat.ipc
> 14.11 Â 88% -2.6 11.50 Â 86% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 14.22 Â 88% -2.6 11.63 Â 85% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 14.22 Â 88% -2.6 11.63 Â 85% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
> 14.22 Â 88% -2.6 11.63 Â 85% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
> 12.86 Â 92% -2.4 10.45 Â 97% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 45.20 Â 3% -1.4 43.82 perf-profile.calltrace.cycles-pp.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 16.60 Â 3% -0.9 15.74 Â 3% perf-profile.calltrace.cycles-pp.vma_merge.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 56.05 Â 2% -0.8 55.25 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
> 14.60 Â 3% -0.7 13.88 Â 2% perf-profile.calltrace.cycles-pp.__vma_adjust.vma_merge.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 54.84 Â 3% -0.7 54.15 perf-profile.calltrace.cycles-pp.sys_brk.entry_SYSCALL_64_fastpath
> 11.52 Â 9% -0.1 11.46 perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 6.30 Â 5% +0.2 6.48 Â 3% perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 27.40 Â 3% +0.8 28.18 Â 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
> 12.40 Â 94% +3.3 15.73 Â 62% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel
> 13.18 Â 88% +3.4 16.55 Â 57% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
> 13.18 Â 88% +3.4 16.55 Â 57% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64
> 13.18 Â 88% +3.4 16.55 Â 57% perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64
> 13.14 Â 88% +3.4 16.53 Â 57% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
> 14.22 Â 88% -2.6 11.63 Â 85% perf-profile.children.cycles-pp.start_secondary
> 45.83 Â 3% -1.2 44.59 perf-profile.children.cycles-pp.do_brk_flags
> 56.30 Â 2% -0.9 55.36 perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
> 17.05 Â 3% -0.8 16.24 Â 3% perf-profile.children.cycles-pp.vma_merge
> 15.45 Â 3% -0.7 14.79 Â 2% perf-profile.children.cycles-pp.__vma_adjust
> 55.47 Â 3% -0.6 54.88 perf-profile.children.cycles-pp.sys_brk
> 12.21 Â 8% -0.1 12.08 perf-profile.children.cycles-pp.perf_event_mmap
> 6.40 Â 5% +0.2 6.57 Â 3% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
> 27.41 Â 3% +0.8 28.19 Â 4% perf-profile.children.cycles-pp.do_idle
> 27.30 Â 3% +0.8 28.07 Â 4% perf-profile.children.cycles-pp.cpuidle_enter_state
> 27.40 Â 3% +0.8 28.18 Â 4% perf-profile.children.cycles-pp.secondary_startup_64
> 27.40 Â 3% +0.8 28.18 Â 4% perf-profile.children.cycles-pp.cpu_startup_entry
> 25.27 +0.9 26.19 perf-profile.children.cycles-pp.intel_idle
> 13.18 Â 88% +3.4 16.55 Â 57% perf-profile.children.cycles-pp.start_kernel
> 4.82 Â 9% +0.0 4.83 Â 5% perf-profile.self.cycles-pp.__vma_adjust
> 5.25 Â 9% +0.0 5.29 Â 2% perf-profile.self.cycles-pp.perf_event_mmap
> 5.33 Â 3% +0.4 5.75 Â 3% perf-profile.self.cycles-pp.do_brk_flags
> 25.26 +0.9 26.19 perf-profile.self.cycles-pp.intel_idle
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Xiaolong
> <config-4.14.0-01234-g63e02a2>
> <job.yaml>
> <reproduce>