Re: [LKP] [PM] 8234f6734c: will-it-scale.per_process_ops -3.6% regression

From: Vincent Guittot
Date: Tue Jan 15 2019 - 08:14:03 EST


Hi Rong,

On Tue, 15 Jan 2019 at 04:24, kernel test robot <rong.a.chen@xxxxxxxxx> wrote:
>
> Greeting,
>
> FYI, we noticed a -3.6% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: 8234f6734c5d74ac794e5517437f51c57d65f865 ("PM-runtime: Switch autosuspend over to using hrtimers")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>

Could you rerun with the patch :
https://lore.kernel.org/patchwork/patch/1030857/ ?
It optimizes autosuspend by reducing the number of call to ktime_get

Regards,
Vincent

> in testcase: will-it-scale
> on test machine: 104 threads Skylake with 192G memory
> with following parameters:
>
> nr_task: 100%
> mode: process
> test: poll2
> cpufreq_governor: performance
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-fpga01/poll2/will-it-scale
>
> commit:
> v4.20-rc7
> 8234f6734c ("PM-runtime: Switch autosuspend over to using hrtimers")
>
> v4.20-rc7 8234f6734c5d74ac794e551743
> ---------------- --------------------------
> fail:runs %reproduction fail:runs
> | | |
> :2 50% 1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
> %stddev %change %stddev
> \ | \
> 240408 -3.6% 231711 will-it-scale.per_process_ops
> 25002520 -3.6% 24097991 will-it-scale.workload
> 351914 -1.7% 345882 interrupts.CAL:Function_call_interrupts
> 1.77 Â 45% -1.1 0.64 mpstat.cpu.idle%
> 106164 Â 24% -23.2% 81494 Â 28% numa-meminfo.node0.AnonHugePages
> 326430 Â 8% -11.3% 289513 softirqs.SCHED
> 1294 -2.0% 1268 vmstat.system.cs
> 3178 +48.4% 4716 Â 16% slabinfo.eventpoll_pwq.active_objs
> 3178 +48.4% 4716 Â 16% slabinfo.eventpoll_pwq.num_objs
> 336.32 -100.0% 0.00 uptime.boot
> 3192 -100.0% 0.00 uptime.idle
> 3.456e+08 Â 76% -89.9% 34913819 Â 62% cpuidle.C1E.time
> 747832 Â 72% -87.5% 93171 Â 45% cpuidle.C1E.usage
> 16209 Â 26% -38.2% 10021 Â 44% cpuidle.POLL.time
> 6352 Â 32% -39.5% 3843 Â 48% cpuidle.POLL.usage
> 885259 Â 2% -13.8% 763434 Â 7% numa-vmstat.node0.numa_hit
> 865117 Â 2% -13.9% 744992 Â 7% numa-vmstat.node0.numa_local
> 405085 Â 7% +38.0% 558905 Â 9% numa-vmstat.node1.numa_hit
> 254056 Â 11% +59.7% 405824 Â 13% numa-vmstat.node1.numa_local
> 738158 Â 73% -88.5% 85078 Â 47% turbostat.C1E
> 1.07 Â 76% -1.0 0.11 Â 62% turbostat.C1E%
> 1.58 Â 49% -65.4% 0.55 Â 6% turbostat.CPU%c1
> 0.15 Â 13% -35.0% 0.10 Â 38% turbostat.CPU%c6
> 153.97 Â 16% -54.7 99.31 turbostat.PKG_%
> 64141 +1.5% 65072 proc-vmstat.nr_anon_pages
> 19541 -7.0% 18178 Â 8% proc-vmstat.nr_shmem
> 18296 +1.1% 18506 proc-vmstat.nr_slab_reclaimable
> 713938 -2.3% 697489 proc-vmstat.numa_hit
> 693688 -2.4% 677228 proc-vmstat.numa_local
> 772220 -1.9% 757334 proc-vmstat.pgalloc_normal
> 798565 -1.8% 784042 proc-vmstat.pgfault
> 732336 -2.7% 712661 proc-vmstat.pgfree
> 20.33 Â 4% -7.0% 18.92 sched_debug.cfs_rq:/.runnable_load_avg.max
> 160603 -44.5% 89108 Â 38% sched_debug.cfs_rq:/.spread0.avg
> 250694 -29.3% 177358 Â 18% sched_debug.cfs_rq:/.spread0.max
> 1109 Â 4% -7.0% 1031 sched_debug.cfs_rq:/.util_avg.max
> 20.33 Â 4% -7.2% 18.88 sched_debug.cpu.cpu_load[0].max
> -10.00 +35.0% -13.50 sched_debug.cpu.nr_uninterruptible.min
> 3.56 Â 10% +44.2% 5.14 Â 18% sched_debug.cpu.nr_uninterruptible.stddev
> 87.10 Â 24% -34.0% 57.44 Â 37% sched_debug.cpu.sched_goidle.avg
> 239.48 -25.6% 178.07 Â 18% sched_debug.cpu.sched_goidle.stddev
> 332.67 Â 7% -25.5% 247.83 Â 13% sched_debug.cpu.ttwu_count.min
> 231.67 Â 8% -15.4% 195.96 Â 12% sched_debug.cpu.ttwu_local.min
> 95.47 -95.5 0.00 perf-profile.calltrace.cycles-pp.poll
> 90.26 -90.3 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.poll
> 90.08 -90.1 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> 89.84 -89.8 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> 88.04 -88.0 0.00 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> 2.66 -0.1 2.54 perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.90 -0.1 1.81 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
> 2.56 +0.1 2.64 perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +2.3 2.29 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
> 0.00 +2.3 2.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
> 17.45 +3.8 21.24 perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +92.7 92.66 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +94.5 94.51 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +94.8 94.75 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +94.9 94.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 96.03 -96.0 0.00 perf-profile.children.cycles-pp.poll
> 90.29 -90.3 0.00 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 90.11 -90.1 0.00 perf-profile.children.cycles-pp.do_syscall_64
> 89.87 -89.9 0.00 perf-profile.children.cycles-pp.__x64_sys_poll
> 89.39 -89.4 0.00 perf-profile.children.cycles-pp.do_sys_poll
> 16.19 -16.2 0.00 perf-profile.children.cycles-pp.__fget_light
> 68.59 -68.6 0.00 perf-profile.self.cycles-pp.do_sys_poll
> 14.84 -14.8 0.00 perf-profile.self.cycles-pp.__fget_light
> 1.759e+13 -100.0% 0.00 perf-stat.branch-instructions
> 0.28 -0.3 0.00 perf-stat.branch-miss-rate%
> 4.904e+10 -100.0% 0.00 perf-stat.branch-misses
> 6.79 Â 3% -6.8 0.00 perf-stat.cache-miss-rate%
> 1.071e+08 Â 4% -100.0% 0.00 perf-stat.cache-misses
> 1.578e+09 -100.0% 0.00 perf-stat.cache-references
> 385311 Â 2% -100.0% 0.00 perf-stat.context-switches
> 1.04 -100.0% 0.00 perf-stat.cpi
> 8.643e+13 -100.0% 0.00 perf-stat.cpu-cycles
> 13787 -100.0% 0.00 perf-stat.cpu-migrations
> 0.00 Â 4% -0.0 0.00 perf-stat.dTLB-load-miss-rate%
> 23324811 Â 5% -100.0% 0.00 perf-stat.dTLB-load-misses
> 1.811e+13 -100.0% 0.00 perf-stat.dTLB-loads
> 0.00 -0.0 0.00 perf-stat.dTLB-store-miss-rate%
> 2478029 -100.0% 0.00 perf-stat.dTLB-store-misses
> 8.775e+12 -100.0% 0.00 perf-stat.dTLB-stores
> 99.66 -99.7 0.00 perf-stat.iTLB-load-miss-rate%
> 7.527e+09 -100.0% 0.00 perf-stat.iTLB-load-misses
> 25540468 Â 39% -100.0% 0.00 perf-stat.iTLB-loads
> 8.33e+13 -100.0% 0.00 perf-stat.instructions
> 11066 -100.0% 0.00 perf-stat.instructions-per-iTLB-miss
> 0.96 -100.0% 0.00 perf-stat.ipc
> 777357 -100.0% 0.00 perf-stat.minor-faults
> 81.69 -81.7 0.00 perf-stat.node-load-miss-rate%
> 20040093 -100.0% 0.00 perf-stat.node-load-misses
> 4491667 Â 7% -100.0% 0.00 perf-stat.node-loads
> 75.23 Â 10% -75.2 0.00 perf-stat.node-store-miss-rate%
> 3418662 Â 30% -100.0% 0.00 perf-stat.node-store-misses
> 1027183 Â 11% -100.0% 0.00 perf-stat.node-stores
> 777373 -100.0% 0.00 perf-stat.page-faults
> 3331644 -100.0% 0.00 perf-stat.path-length
>
>
>
> will-it-scale.per_process_ops
>
> 242000 +-+----------------------------------------------------------------+
> | +.+.. .+..+. .+.+..+.+.+. .+.+.. |
> 240000 +-+ + +.+ +.+..+ +..+ +.|
> 238000 +-+..+.+. .+. .+..+ |
> | +. +.+ |
> 236000 +-+ |
> | |
> 234000 +-+ |
> | O O O O |
> 232000 +-+ O O O O O O O O O O O O O |
> 230000 +-+ O O O O O O |
> | O |
> 228000 O-+ O O |
> | O O |
> 226000 +-+----------------------------------------------------------------+
>
>
> will-it-scale.workload
>
> 2.52e+07 +-+--------------------------------------------------------------+
> | +..+. .+..+. .+. .+.+..+. .+..+. |
> 2.5e+07 +-+ + +.+ +.+.+. + +.+ +.|
> 2.48e+07 +-+.+..+. .+. .+.+ |
> | + +..+ |
> 2.46e+07 +-+ |
> 2.44e+07 +-+ |
> | |
> 2.42e+07 +-+ O O O O O O O O |
> 2.4e+07 +-+ O O O O O O O O O O |
> | O O O O O O |
> 2.38e+07 O-+ O |
> 2.36e+07 +-O O O |
> | |
> 2.34e+07 +-+--------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen