Re: [LKP] [PM] 8234f6734c: will-it-scale.per_process_ops -3.6% regression

From: kernel test robot
Date: Wed Jan 16 2019 - 10:38:21 EST


On Tue, Jan 15, 2019 at 02:13:47PM +0100, Vincent Guittot wrote:
> Hi Rong,
>
> On Tue, 15 Jan 2019 at 04:24, kernel test robot <rong.a.chen@xxxxxxxxx> wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a -3.6% regression of will-it-scale.per_process_ops due to commit:
> >
> >
> > commit: 8234f6734c5d74ac794e5517437f51c57d65f865 ("PM-runtime: Switch autosuspend over to using hrtimers")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
>
> Could you rerun with the patch :
> https://lore.kernel.org/patchwork/patch/1030857/ ?
> It optimizes autosuspend by reducing the number of call to ktime_get

Hi Vincent,

the regression of will-it-scale.per_process_ops is still exist according to the result.

commit:
v4.20-rc7
c534491102 ("PM/runtime: Do not needlessly call ktime_get")

v4.20-rc7 c534491102b35a2075c78b72bb
---------------- --------------------------
%stddev change %stddev
\ | \
25028944 -4% 23987264 will-it-scale.workload
240662 -4% 230646 will-it-scale.per_process_ops
80031 78804 proc-vmstat.nr_zone_active_anon
80031 78804 proc-vmstat.nr_active_anon
7649 Â173% -6e+03 1870 Â133% latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate_dentry.nfs_do_lookup_revalidate.__nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk
7654 Â173% -6e+03 1834 Â133% latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
13537 Â173% -1e+04 0 latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
66199 Â130% -7e+04 0 latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
389513 Â161% -4e+05 0 latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
629 Â 65% 4e+03 4446 Â123% latency_stats.max.io_schedule.__lock_page.do_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
7748 Â173% -6e+03 1899 Â133% latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate_dentry.nfs_do_lookup_revalidate.__nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk
7750 Â173% -6e+03 1845 Â133% latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
13537 Â173% -1e+04 0 latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
66199 Â130% -7e+04 0 latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
392739 Â159% -4e+05 0 latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
15365 Â 41% 2e+05 194745 Â123% latency_stats.sum.io_schedule.__lock_page.do_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
9214 Â 30% 6e+04 71022 Â 22% latency_stats.sum.pipe_wait.pipe_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
15299 Â173% -1e+04 3740 Â133% latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate_dentry.nfs_do_lookup_revalidate.__nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk
13537 Â173% -1e+04 0 latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
22963 Â173% -2e+04 3668 Â133% latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
66199 Â130% -7e+04 0 latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
408736 Â151% -4e+05 0 latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
79146 Â 26% 626% 574528 Â 5% perf-stat.i.iTLB-loads
8211 Â 7% 56% 12770 Â 14% perf-stat.i.node-store-misses
0.28 15% 0.32 perf-stat.overall.branch-miss-rate%
1.61e+08 11% 1.791e+08 perf-stat.i.branch-misses
71.87 10% 79.18 Â 3% perf-stat.overall.node-store-miss-rate%
13107 Â 4% 7% 14023 perf-stat.i.node-loads
1.04 5% 1.09 perf-stat.overall.cpi
99.05 97.63 perf-stat.i.iTLB-load-miss-rate%
83.87 82.36 perf-stat.overall.node-load-miss-rate%
99.68 97.65 perf-stat.overall.iTLB-load-miss-rate%
24777147 -3% 23919344 perf-stat.i.iTLB-load-misses
2.743e+11 -4% 2.646e+11 perf-stat.i.instructions
5.791e+10 -4% 5.586e+10 perf-stat.i.branch-instructions
2.89e+10 -4% 2.787e+10 perf-stat.i.dTLB-stores
5.964e+10 -4% 5.752e+10 perf-stat.i.dTLB-loads
0.96 -4% 0.92 perf-stat.i.ipc
8.333e+13 -4% 7.976e+13 perf-stat.total.instructions
0.96 -4% 0.92 perf-stat.overall.ipc
355843 Â 4% -12% 313369 Â 4% perf-stat.i.cache-misses

Best Regards,
Rong Chen

>
> Regards,
> Vincent
>
> > in testcase: will-it-scale
> > on test machine: 104 threads Skylake with 192G memory
> > with following parameters:
> >
> > nr_task: 100%
> > mode: process
> > test: poll2
> > cpufreq_governor: performance
> >
> > test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> > test-url: https://github.com/antonblanchard/will-it-scale
> >
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp run job.yaml
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> > gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-fpga01/poll2/will-it-scale
> >
> > commit:
> > v4.20-rc7
> > 8234f6734c ("PM-runtime: Switch autosuspend over to using hrtimers")
> >
> > v4.20-rc7 8234f6734c5d74ac794e551743
> > ---------------- --------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :2 50% 1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
> > %stddev %change %stddev
> > \ | \
> > 240408 -3.6% 231711 will-it-scale.per_process_ops
> > 25002520 -3.6% 24097991 will-it-scale.workload
> > 351914 -1.7% 345882 interrupts.CAL:Function_call_interrupts
> > 1.77 Â 45% -1.1 0.64 mpstat.cpu.idle%
> > 106164 Â 24% -23.2% 81494 Â 28% numa-meminfo.node0.AnonHugePages
> > 326430 Â 8% -11.3% 289513 softirqs.SCHED
> > 1294 -2.0% 1268 vmstat.system.cs
> > 3178 +48.4% 4716 Â 16% slabinfo.eventpoll_pwq.active_objs
> > 3178 +48.4% 4716 Â 16% slabinfo.eventpoll_pwq.num_objs
> > 336.32 -100.0% 0.00 uptime.boot
> > 3192 -100.0% 0.00 uptime.idle
> > 3.456e+08 Â 76% -89.9% 34913819 Â 62% cpuidle.C1E.time
> > 747832 Â 72% -87.5% 93171 Â 45% cpuidle.C1E.usage
> > 16209 Â 26% -38.2% 10021 Â 44% cpuidle.POLL.time
> > 6352 Â 32% -39.5% 3843 Â 48% cpuidle.POLL.usage
> > 885259 Â 2% -13.8% 763434 Â 7% numa-vmstat.node0.numa_hit
> > 865117 Â 2% -13.9% 744992 Â 7% numa-vmstat.node0.numa_local
> > 405085 Â 7% +38.0% 558905 Â 9% numa-vmstat.node1.numa_hit
> > 254056 Â 11% +59.7% 405824 Â 13% numa-vmstat.node1.numa_local
> > 738158 Â 73% -88.5% 85078 Â 47% turbostat.C1E
> > 1.07 Â 76% -1.0 0.11 Â 62% turbostat.C1E%
> > 1.58 Â 49% -65.4% 0.55 Â 6% turbostat.CPU%c1
> > 0.15 Â 13% -35.0% 0.10 Â 38% turbostat.CPU%c6
> > 153.97 Â 16% -54.7 99.31 turbostat.PKG_%
> > 64141 +1.5% 65072 proc-vmstat.nr_anon_pages
> > 19541 -7.0% 18178 Â 8% proc-vmstat.nr_shmem
> > 18296 +1.1% 18506 proc-vmstat.nr_slab_reclaimable
> > 713938 -2.3% 697489 proc-vmstat.numa_hit
> > 693688 -2.4% 677228 proc-vmstat.numa_local
> > 772220 -1.9% 757334 proc-vmstat.pgalloc_normal
> > 798565 -1.8% 784042 proc-vmstat.pgfault
> > 732336 -2.7% 712661 proc-vmstat.pgfree
> > 20.33 Â 4% -7.0% 18.92 sched_debug.cfs_rq:/.runnable_load_avg.max
> > 160603 -44.5% 89108 Â 38% sched_debug.cfs_rq:/.spread0.avg
> > 250694 -29.3% 177358 Â 18% sched_debug.cfs_rq:/.spread0.max
> > 1109 Â 4% -7.0% 1031 sched_debug.cfs_rq:/.util_avg.max
> > 20.33 Â 4% -7.2% 18.88 sched_debug.cpu.cpu_load[0].max
> > -10.00 +35.0% -13.50 sched_debug.cpu.nr_uninterruptible.min
> > 3.56 Â 10% +44.2% 5.14 Â 18% sched_debug.cpu.nr_uninterruptible.stddev
> > 87.10 Â 24% -34.0% 57.44 Â 37% sched_debug.cpu.sched_goidle.avg
> > 239.48 -25.6% 178.07 Â 18% sched_debug.cpu.sched_goidle.stddev
> > 332.67 Â 7% -25.5% 247.83 Â 13% sched_debug.cpu.ttwu_count.min
> > 231.67 Â 8% -15.4% 195.96 Â 12% sched_debug.cpu.ttwu_local.min
> > 95.47 -95.5 0.00 perf-profile.calltrace.cycles-pp.poll
> > 90.26 -90.3 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.poll
> > 90.08 -90.1 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> > 89.84 -89.8 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> > 88.04 -88.0 0.00 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> > 2.66 -0.1 2.54 perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 1.90 -0.1 1.81 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
> > 2.56 +0.1 2.64 perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.00 +2.3 2.29 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
> > 0.00 +2.3 2.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
> > 17.45 +3.8 21.24 perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.00 +92.7 92.66 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.00 +94.5 94.51 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.00 +94.8 94.75 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.00 +94.9 94.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 96.03 -96.0 0.00 perf-profile.children.cycles-pp.poll
> > 90.29 -90.3 0.00 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 90.11 -90.1 0.00 perf-profile.children.cycles-pp.do_syscall_64
> > 89.87 -89.9 0.00 perf-profile.children.cycles-pp.__x64_sys_poll
> > 89.39 -89.4 0.00 perf-profile.children.cycles-pp.do_sys_poll
> > 16.19 -16.2 0.00 perf-profile.children.cycles-pp.__fget_light
> > 68.59 -68.6 0.00 perf-profile.self.cycles-pp.do_sys_poll
> > 14.84 -14.8 0.00 perf-profile.self.cycles-pp.__fget_light
> > 1.759e+13 -100.0% 0.00 perf-stat.branch-instructions
> > 0.28 -0.3 0.00 perf-stat.branch-miss-rate%
> > 4.904e+10 -100.0% 0.00 perf-stat.branch-misses
> > 6.79 Â 3% -6.8 0.00 perf-stat.cache-miss-rate%
> > 1.071e+08 Â 4% -100.0% 0.00 perf-stat.cache-misses
> > 1.578e+09 -100.0% 0.00 perf-stat.cache-references
> > 385311 Â 2% -100.0% 0.00 perf-stat.context-switches
> > 1.04 -100.0% 0.00 perf-stat.cpi
> > 8.643e+13 -100.0% 0.00 perf-stat.cpu-cycles
> > 13787 -100.0% 0.00 perf-stat.cpu-migrations
> > 0.00 Â 4% -0.0 0.00 perf-stat.dTLB-load-miss-rate%
> > 23324811 Â 5% -100.0% 0.00 perf-stat.dTLB-load-misses
> > 1.811e+13 -100.0% 0.00 perf-stat.dTLB-loads
> > 0.00 -0.0 0.00 perf-stat.dTLB-store-miss-rate%
> > 2478029 -100.0% 0.00 perf-stat.dTLB-store-misses
> > 8.775e+12 -100.0% 0.00 perf-stat.dTLB-stores
> > 99.66 -99.7 0.00 perf-stat.iTLB-load-miss-rate%
> > 7.527e+09 -100.0% 0.00 perf-stat.iTLB-load-misses
> > 25540468 Â 39% -100.0% 0.00 perf-stat.iTLB-loads
> > 8.33e+13 -100.0% 0.00 perf-stat.instructions
> > 11066 -100.0% 0.00 perf-stat.instructions-per-iTLB-miss
> > 0.96 -100.0% 0.00 perf-stat.ipc
> > 777357 -100.0% 0.00 perf-stat.minor-faults
> > 81.69 -81.7 0.00 perf-stat.node-load-miss-rate%
> > 20040093 -100.0% 0.00 perf-stat.node-load-misses
> > 4491667 Â 7% -100.0% 0.00 perf-stat.node-loads
> > 75.23 Â 10% -75.2 0.00 perf-stat.node-store-miss-rate%
> > 3418662 Â 30% -100.0% 0.00 perf-stat.node-store-misses
> > 1027183 Â 11% -100.0% 0.00 perf-stat.node-stores
> > 777373 -100.0% 0.00 perf-stat.page-faults
> > 3331644 -100.0% 0.00 perf-stat.path-length
> >
> >
> >
> > will-it-scale.per_process_ops
> >
> > 242000 +-+----------------------------------------------------------------+
> > | +.+.. .+..+. .+.+..+.+.+. .+.+.. |
> > 240000 +-+ + +.+ +.+..+ +..+ +.|
> > 238000 +-+..+.+. .+. .+..+ |
> > | +. +.+ |
> > 236000 +-+ |
> > | |
> > 234000 +-+ |
> > | O O O O |
> > 232000 +-+ O O O O O O O O O O O O O |
> > 230000 +-+ O O O O O O |
> > | O |
> > 228000 O-+ O O |
> > | O O |
> > 226000 +-+----------------------------------------------------------------+
> >
> >
> > will-it-scale.workload
> >
> > 2.52e+07 +-+--------------------------------------------------------------+
> > | +..+. .+..+. .+. .+.+..+. .+..+. |
> > 2.5e+07 +-+ + +.+ +.+.+. + +.+ +.|
> > 2.48e+07 +-+.+..+. .+. .+.+ |
> > | + +..+ |
> > 2.46e+07 +-+ |
> > 2.44e+07 +-+ |
> > | |
> > 2.42e+07 +-+ O O O O O O O O |
> > 2.4e+07 +-+ O O O O O O O O O O |
> > | O O O O O O |
> > 2.38e+07 O-+ O |
> > 2.36e+07 +-O O O |
> > | |
> > 2.34e+07 +-+--------------------------------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Rong Chen
> _______________________________________________
> LKP mailing list
> LKP@xxxxxxxxxxxx
> https://lists.01.org/mailman/listinfo/lkp