Re: [lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

From: Huang Ying
Date: Tue Sep 01 2015 - 22:40:37 EST


On Wed, 2015-08-05 at 10:38 +0200, Ingo Molnar wrote:
> * kernel test robot <ying.huang@xxxxxxxxx> wrote:
>
> > FYI, we noticed the below changes on
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
> > commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 ("x86/build: Fix detection of GCC -mpreferred-stack-boundary support")
>
> Does the performance regression go away reproducibly if you do:
>
> git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
>
> ?

Sorry for reply so late!

Revert the commit will restore part of the performance, as below.
parent commit: f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
the commit: b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
revert commit: 987d12601a4a82cc2f2151b1be704723eb84cb9d

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/cpufreq_governor/test:
wsm/will-it-scale/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/performance/readseek2

commit:
f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
987d12601a4a82cc2f2151b1be704723eb84cb9d

f2a50f8b7da45ff2 b2c51106c7581866c37ffc77c5 987d12601a4a82cc2f2151b1be
---------------- -------------------------- --------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
879002 Â 0% -18.1% 720270 Â 7% -3.6% 847011 Â 2% will-it-scale.per_process_ops
0.02 Â 0% +34.5% 0.02 Â 7% +5.6% 0.02 Â 2% will-it-scale.scalability
11144 Â 0% +0.1% 11156 Â 0% +10.6% 12320 Â 0% will-it-scale.time.minor_page_faults
769.30 Â 0% -0.9% 762.15 Â 0% +1.1% 777.42 Â 0% will-it-scale.time.system_time
26153173 Â 0% +7.0% 27977076 Â 0% +3.5% 27078124 Â 0% will-it-scale.time.voluntary_context_switches
2964 Â 2% +1.4% 3004 Â 1% -51.9% 1426 Â 2% proc-vmstat.pgactivate
0.06 Â 27% +154.5% 0.14 Â 44% +122.7% 0.12 Â 24% turbostat.CPU%c3
370683 Â 0% +6.2% 393491 Â 0% +2.4% 379575 Â 0% vmstat.system.cs
11144 Â 0% +0.1% 11156 Â 0% +10.6% 12320 Â 0% time.minor_page_faults
15.70 Â 2% +14.5% 17.98 Â 0% +1.5% 15.94 Â 1% time.user_time
830343 Â 56% -54.0% 382128 Â 39% -22.3% 645308 Â 65% cpuidle.C1E-NHM.time
788.25 Â 14% -21.7% 617.25 Â 16% -12.3% 691.00 Â 3% cpuidle.C1E-NHM.usage
2489132 Â 20% +79.3% 4464147 Â 33% +78.4% 4440574 Â 21% cpuidle.C3-NHM.time
1082762 Â162% -100.0% 0.00 Â -1% +189.3% 3132030 Â110% latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
102189 Â 2% -2.1% 100087 Â 5% -32.9% 68568 Â 2% latency_stats.hits.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
1082762 Â162% -100.0% 0.00 Â -1% +289.6% 4217977 Â109% latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
1082762 Â162% -100.0% 0.00 Â -1% +478.5% 6264061 Â110% latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
5.10 Â 2% -8.0% 4.69 Â 1% +13.0% 5.76 Â 1% perf-profile.cpu-cycles.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
2.58 Â 8% +19.5% 3.09 Â 3% -1.8% 2.54 Â 11% perf-profile.cpu-cycles._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry
7.02 Â 3% +9.2% 7.67 Â 2% +7.1% 7.52 Â 3% perf-profile.cpu-cycles._raw_spin_lock_irqsave.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry
3.07 Â 2% +14.8% 3.53 Â 3% -1.4% 3.03 Â 5% perf-profile.cpu-cycles.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
3.05 Â 5% -8.4% 2.79 Â 4% -5.2% 2.90 Â 5% perf-profile.cpu-cycles.hrtimer_start_range_ns.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_idle_enter.cpu_startup_entry
0.89 Â 5% -7.6% 0.82 Â 3% +16.3% 1.03 Â 5% perf-profile.cpu-cycles.is_ftrace_trampoline.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk
0.98 Â 3% -25.1% 0.74 Â 7% -16.8% 0.82 Â 2% perf-profile.cpu-cycles.is_ftrace_trampoline.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
1.58 Â 3% -5.2% 1.50 Â 2% +44.2% 2.28 Â 1% perf-profile.cpu-cycles.is_module_text_address.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk
1.82 Â 18% +46.6% 2.67 Â 3% -32.6% 1.23 Â 56% perf-profile.cpu-cycles.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page
8.05 Â 3% +9.5% 8.82 Â 3% +5.4% 8.49 Â 2% perf-profile.cpu-cycles.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
1.16 Â 2% +6.9% 1.25 Â 5% +11.4% 1.30 Â 5% perf-profile.cpu-cycles.put_page.shmem_file_read_iter.__vfs_read.vfs_read.sys_read
11102 Â 1% +0.0% 11102 Â 1% -95.8% 468.00 Â 0% slabinfo.Acpi-ParseExt.active_objs
198.25 Â 1% +0.0% 198.25 Â 1% -93.9% 12.00 Â 0% slabinfo.Acpi-ParseExt.active_slabs
11102 Â 1% +0.0% 11102 Â 1% -95.8% 468.00 Â 0% slabinfo.Acpi-ParseExt.num_objs
198.25 Â 1% +0.0% 198.25 Â 1% -93.9% 12.00 Â 0% slabinfo.Acpi-ParseExt.num_slabs
341.25 Â 14% +2.9% 351.00 Â 11% -100.0% 0.00 Â -1% slabinfo.blkdev_ioc.active_objs
341.25 Â 14% +2.9% 351.00 Â 11% -100.0% 0.00 Â -1% slabinfo.blkdev_ioc.num_objs
438.00 Â 16% -8.3% 401.50 Â 20% -100.0% 0.00 Â -1% slabinfo.file_lock_ctx.active_objs
438.00 Â 16% -8.3% 401.50 Â 20% -100.0% 0.00 Â -1% slabinfo.file_lock_ctx.num_objs
4398 Â 1% +1.4% 4462 Â 0% -14.5% 3761 Â 2% slabinfo.ftrace_event_field.active_objs
4398 Â 1% +1.4% 4462 Â 0% -14.5% 3761 Â 2% slabinfo.ftrace_event_field.num_objs
3947 Â 2% +10.6% 4363 Â 3% +107.1% 8175 Â 2% slabinfo.kmalloc-192.active_objs
93.00 Â 2% +10.8% 103.00 Â 3% +120.2% 204.75 Â 2% slabinfo.kmalloc-192.active_slabs
3947 Â 2% +10.6% 4363 Â 3% +118.4% 8620 Â 2% slabinfo.kmalloc-192.num_objs
93.00 Â 2% +10.8% 103.00 Â 3% +120.2% 204.75 Â 2% slabinfo.kmalloc-192.num_slabs
1794 Â 0% +3.2% 1851 Â 2% +12.2% 2012 Â 3% slabinfo.trace_event_file.active_objs
1794 Â 0% +3.2% 1851 Â 2% +12.2% 2012 Â 3% slabinfo.trace_event_file.num_objs
7065 Â 7% -5.4% 6684 Â 8% -100.0% 0.00 Â -1% slabinfo.vm_area_struct.active_objs
160.50 Â 7% -5.5% 151.75 Â 8% -100.0% 0.00 Â -1% slabinfo.vm_area_struct.active_slabs
7091 Â 7% -5.6% 6694 Â 8% -100.0% 0.00 Â -1% slabinfo.vm_area_struct.num_objs
160.50 Â 7% -5.5% 151.75 Â 8% -100.0% 0.00 Â -1% slabinfo.vm_area_struct.num_slabs
857.50 Â 29% +75.7% 1506 Â 78% +157.6% 2209 Â 33% sched_debug.cfs_rq[11]:/.blocked_load_avg
52.75 Â 29% -29.4% 37.25 Â 60% +103.3% 107.25 Â 43% sched_debug.cfs_rq[11]:/.load
914.50 Â 29% +69.9% 1553 Â 77% +155.6% 2337 Â 32% sched_debug.cfs_rq[11]:/.tg_load_contrib
7.75 Â 34% -64.5% 2.75 Â 64% -12.9% 6.75 Â115% sched_debug.cfs_rq[2]:/.nr_spread_over
1135 Â 20% -43.6% 640.75 Â 49% -18.8% 922.50 Â 51% sched_debug.cfs_rq[3]:/.blocked_load_avg
1215 Â 21% -43.1% 691.50 Â 46% -21.3% 956.25 Â 50% sched_debug.cfs_rq[3]:/.tg_load_contrib
38.50 Â 21% +129.9% 88.50 Â 36% +96.1% 75.50 Â 56% sched_debug.cfs_rq[4]:/.load
26.00 Â 20% +98.1% 51.50 Â 46% +142.3% 63.00 Â 53% sched_debug.cfs_rq[4]:/.runnable_load_avg
128.25 Â 18% +227.5% 420.00 Â 43% +152.4% 323.75 Â 68% sched_debug.cfs_rq[4]:/.utilization_load_avg
28320 Â 12% -6.3% 26545 Â 11% -19.4% 22813 Â 13% sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
1015 Â 78% +101.1% 2042 Â 25% +64.4% 1669 Â 73% sched_debug.cfs_rq[6]:/.blocked_load_avg
1069 Â 72% +100.2% 2140 Â 23% +61.2% 1722 Â 70% sched_debug.cfs_rq[6]:/.tg_load_contrib
619.25 Â 12% -6.3% 580.25 Â 11% -19.2% 500.25 Â 13% sched_debug.cfs_rq[6]:/.tg_runnable_contrib
88.75 Â 14% -47.3% 46.75 Â 36% -24.5% 67.00 Â 11% sched_debug.cfs_rq[9]:/.load
59.25 Â 23% -41.4% 34.75 Â 34% -6.3% 55.50 Â 12% sched_debug.cfs_rq[9]:/.runnable_load_avg
315.50 Â 45% -64.6% 111.67 Â 1% -12.1% 277.25 Â 3% sched_debug.cfs_rq[9]:/.utilization_load_avg
2246758 Â 7% +87.6% 4213925 Â 65% -2.2% 2197475 Â 4% sched_debug.cpu#0.nr_switches
2249376 Â 7% +87.4% 4215969 Â 65% -2.2% 2199216 Â 4% sched_debug.cpu#0.sched_count
1121438 Â 7% +81.0% 2030313 Â 61% -2.2% 1096479 Â 4% sched_debug.cpu#0.sched_goidle
1151160 Â 7% +86.5% 2146608 Â 64% -1.9% 1129264 Â 3% sched_debug.cpu#0.ttwu_count
33.75 Â 15% -22.2% 26.25 Â 6% -8.9% 30.75 Â 10% sched_debug.cpu#1.cpu_load[3]
33.25 Â 10% -18.0% 27.25 Â 7% -3.8% 32.00 Â 11% sched_debug.cpu#1.cpu_load[4]
41.75 Â 29% +23.4% 51.50 Â 33% +53.9% 64.25 Â 16% sched_debug.cpu#10.cpu_load[1]
40.00 Â 18% +24.4% 49.75 Â 18% +49.4% 59.75 Â 8% sched_debug.cpu#10.cpu_load[2]
39.25 Â 14% +22.3% 48.00 Â 10% +38.9% 54.50 Â 7% sched_debug.cpu#10.cpu_load[3]
39.50 Â 15% +20.3% 47.50 Â 6% +30.4% 51.50 Â 7% sched_debug.cpu#10.cpu_load[4]
5269004 Â 1% +27.8% 6731790 Â 30% +1.4% 5342560 Â 2% sched_debug.cpu#10.nr_switches
5273193 Â 1% +27.8% 6736526 Â 30% +1.4% 5345791 Â 2% sched_debug.cpu#10.sched_count
2633974 Â 1% +27.8% 3365271 Â 30% +1.4% 2670901 Â 2% sched_debug.cpu#10.sched_goidle
2644149 Â 1% +26.9% 3356318 Â 30% +1.9% 2693295 Â 1% sched_debug.cpu#10.ttwu_count
26.50 Â 37% +116.0% 57.25 Â 48% +109.4% 55.50 Â 29% sched_debug.cpu#11.cpu_load[0]
30.75 Â 15% +66.7% 51.25 Â 31% +65.9% 51.00 Â 21% sched_debug.cpu#11.cpu_load[1]
33.50 Â 10% +37.3% 46.00 Â 22% +39.6% 46.75 Â 17% sched_debug.cpu#11.cpu_load[2]
37.00 Â 11% +15.5% 42.75 Â 19% +29.7% 48.00 Â 11% sched_debug.cpu#11.cpu_load[4]
508300 Â 11% -0.6% 505024 Â 1% +18.1% 600291 Â 7% sched_debug.cpu#4.avg_idle
454696 Â 9% -5.9% 427894 Â 25% +21.8% 553608 Â 4% sched_debug.cpu#5.avg_idle
66.00 Â 27% +11.0% 73.25 Â 37% -46.6% 35.25 Â 22% sched_debug.cpu#6.cpu_load[0]
62.00 Â 36% +12.5% 69.75 Â 45% -41.5% 36.25 Â 11% sched_debug.cpu#6.cpu_load[1]
247681 Â 19% +21.0% 299747 Â 10% +28.7% 318764 Â 17% sched_debug.cpu#8.avg_idle
5116609 Â 4% +34.5% 6884238 Â 33% +55.2% 7942254 Â 34% sched_debug.cpu#9.nr_switches
5120531 Â 4% +34.5% 6889156 Â 33% +55.2% 7945270 Â 34% sched_debug.cpu#9.sched_count
2557822 Â 4% +34.5% 3441428 Â 33% +55.2% 3970337 Â 34% sched_debug.cpu#9.sched_goidle
2565307 Â 4% +32.9% 3410042 Â 33% +54.0% 3949696 Â 34% sched_debug.cpu#9.ttwu_count
0.00 Â141% +4.2e+05% 4.76 Â173% +47.7% 0.00 Â-59671% sched_debug.rt_rq[10]:/.rt_time
155259 Â 0% +0.0% 155259 Â 0% -42.2% 89723 Â 0% sched_debug.sysctl_sched.sysctl_sched_features

Best Regards,
Huang, Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/