Re: [LKP] [ACPICA] 4c1379d7bb: reaim.jobs_per_min -2.8% regression

From: kernel test robot
Date: Wed Feb 20 2019 - 22:14:35 EST


On Wed, Feb 20, 2019 at 08:28:49AM +0800, Schmauss, Erik wrote:
>
>
> > -----Original Message-----
> > From: Chen, Rong A
> > Sent: Sunday, February 17, 2019 5:36 PM
> > To: Moore, Robert <robert.moore@xxxxxxxxx>
> > Cc: Wysocki, Rafael J <rafael.j.wysocki@xxxxxxxxx>; Schmauss, Erik
> > <erik.schmauss@xxxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; Linus
> > Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>; lkp@xxxxxx
> > Subject: [LKP] [ACPICA] 4c1379d7bb: reaim.jobs_per_min -2.8% regression
> >
> > Greeting,
> >
> > FYI, we noticed a -2.8% regression of reaim.jobs_per_min due to commit:
> >
> >
> > commit: 4c1379d7bb42fa664e0784539208ed74108c53f1 ("ACPICA: Debug
> > output: Add option to display method/object evaluation")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> Hi,
>
> Does the performance get back to normal when you apply the following changes? I've added this as an attachment as well...

No, the regression is still exist.

commit:
73a049a90f ("ACPICA: disassembler: disassemble OEMx tables as AML")
40a76ed786 ("es-dbg-experiement-01")

testcase/path_params/tbox_group/run: reaim/performance-1500-100%-300s-all_utime-ucode=0x3d/lkp-hsw-ep2

73a049a90fb241af 40a76ed786180150065d02ae1f
---------------- --------------------------
%stddev change %stddev
\ | \
0.09 Â 8% 79% 0.17 Â 4% reaim.std_dev_time
0.45 Â 8% 79% 0.80 Â 4% reaim.std_dev_percent
127078 28% 163173 Â 3% reaim.time.involuntary_context_switches
0.07 Â 5% 13% 0.08 reaim.child_systime
20.93 21.42 reaim.parent_time
314488 307615 reaim.max_jobs_per_min
6445 6303 reaim.time.percent_of_cpu_this_job_got
308795 301705 reaim.jobs_per_min
4288 4190 reaim.jobs_per_min_child
321 -5% 304 reaim.time.elapsed_time
321 -5% 304 reaim.time.elapsed_time.max
4928 -7% 4599 reaim.time.voluntary_context_switches
1512000 -7% 1404000 reaim.workload
92671 -7% 86028 reaim.time.minor_page_faults
20720 -7% 19216 reaim.time.user_time

Best Regards,
Rong Chen

>
> diff --git a/drivers/acpi/acpica/dsmethod.c b/drivers/acpi/acpica/dsmethod.c
> index f59b4d944f7f..b524b4273d9f 100644
> --- a/drivers/acpi/acpica/dsmethod.c
> +++ b/drivers/acpi/acpica/dsmethod.c
> @@ -552,9 +552,11 @@ acpi_ds_call_control_method(struct acpi_thread_state *thread,
> "**** Begin nested execution of [%4.4s] **** WalkState=%p\n",
> method_node->name.ascii, next_walk_state));
>
> +#if 0
> this_walk_state->method_pathname =
> acpi_ns_get_normalized_pathname(method_node, TRUE);
> this_walk_state->method_is_nested = TRUE;
> +#endif
>
> /* Optional object evaluation log */
>
> diff --git a/drivers/acpi/acpica/psparse.c b/drivers/acpi/acpica/psparse.c
> index 9b386530ffbe..c39db1a83222 100644
> --- a/drivers/acpi/acpica/psparse.c
> +++ b/drivers/acpi/acpica/psparse.c
> @@ -483,6 +483,7 @@ acpi_status acpi_ps_parse_aml(struct acpi_walk_state *walk_state)
>
> /* Optional object evaluation log */
>
> +#if 0
> ACPI_DEBUG_PRINT_RAW((ACPI_DB_EVALUATION,
> "%-26s: %*s%s\n",
> " Exit nested method",
> @@ -492,6 +493,7 @@ acpi_status acpi_ps_parse_aml(struct acpi_walk_state *walk_state)
> &walk_state->method_pathname[1]));
>
> ACPI_FREE(walk_state->method_pathname);
> +#endif
> walk_state->method_is_nested = FALSE;
> }
> if (status == AE_CTRL_TRANSFER) {
> >
> > in testcase: ftq
> > on test machine: 256 threads Phi with 96G memory with following
> > parameters:
> >
> > nr_task: 100%
> > samples: 6000ss
> > test: cache
> > freq: 20
> > cpufreq_governor: performance
> >
> > test-description: The FTQ benchmarks measure hardware and software
> > interference or 'noise' on a node from the applications perspective.
> > test-url: https://github.com/rminnich/ftq
> >
> > In addition to that, the commit also has significant impact on the following
> > tests:
> >
> > +------------------+------------------------------------------------------------------------
> > ---+
> > | testcase: change | perf_test:
> > perf_test.msr_test.round1.cpu_util_zero.pass -100.0% undefined |
> > | test machine | 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
> > with 128G memory |
> > | test parameters | type=lkp |
> > | | ucode=0x3d |
> > +------------------+------------------------------------------------------------------------
> > ---+
> >
> >
> > Details are as below:
> > ----------------------------------------------------------------------------------------------
> > ---->
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp run job.yaml
> >
> > ==========================================================
> > ===============================
> > compiler/cpufreq_governor/kconfig/nr_job/nr_task/rootfs/runtime/tbox_g
> > roup/test/testcase/ucode:
> > gcc-7/performance/x86_64-rhel-7.2/1500/100%/debian-x86_64-2018-04-
> > 03.cgz/300s/lkp-hsw-ep2/all_utime/reaim/0x3d
> >
> > commit:
> > 73a049a90f ("ACPICA: disassembler: disassemble OEMx tables as AML")
> > 4c1379d7bb ("ACPICA: Debug output: Add option to display method/object
> > evaluation")
> >
> > 73a049a90fb241af 4c1379d7bb42fa664e07845392
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 0.07 Â 3% +20.2% 0.09 Â 12% reaim.child_systime
> > 308866 -2.8% 300200 reaim.jobs_per_min
> > 4289 -2.8% 4169 reaim.jobs_per_min_child
> > 314258 -2.7% 305679 reaim.max_jobs_per_min
> > 20.93 +2.9% 21.53 reaim.parent_time
> > 0.44 Â 9% +84.5% 0.82 Â 6% reaim.std_dev_percent
> > 0.09 Â 9% +83.7% 0.17 Â 6% reaim.std_dev_time
> > 321.38 -4.7% 306.29 reaim.time.elapsed_time
> > 321.38 -4.7% 306.29 reaim.time.elapsed_time.max
> > 127264 Â 2% +40.6% 178887
> > reaim.time.involuntary_context_switches
> > 93024 Â 2% -8.4% 85173 reaim.time.minor_page_faults
> > 6448 -2.6% 6282 reaim.time.percent_of_cpu_this_job_got
> > 20724 -7.2% 19241 reaim.time.user_time
> > 4929 -6.9% 4591 reaim.time.voluntary_context_switches
> > 1512000 -7.1% 1404000 reaim.workload
> > 442.00 Â158% -100.0% 0.00 meminfo.Mlocked
> > 0.03 Â 2% +0.1 0.11 mpstat.cpu.sys%
> > 1231 +56.9% 1932 Â 3% vmstat.system.cs
> > 63.75 Â158% -100.0% 0.00 numa-vmstat.node0.nr_mlock
> > 46.25 Â159% -100.0% 0.00 numa-vmstat.node1.nr_mlock
> > 1.50 Â 15% -0.3 1.24 Â 15% perf-stat.overall.cache-miss-rate%
> > 50.13 Â 3% +6.4 56.58 Â 5% perf-stat.overall.iTLB-load-miss-rate%
> > 566199 Â 6% -23.7% 432237 Â 7% perf-stat.overall.instructions-per-
> > iTLB-miss
> > 59.86 Â 3% -23.1 36.79 Â 10% perf-stat.overall.node-store-miss-rate%
> > 6653969 Â 8% +25.4% 8340919 Â 14% perf-stat.ps.cache-references
> > 1202 +56.7% 1884 Â 4% perf-stat.ps.context-switches
> > 185718 Â 5% +28.9% 239386 Â 7% perf-stat.ps.iTLB-load-misses
> > 2755 -1.5% 2713 perf-stat.ps.minor-faults
> > 22055 Â 22% -41.6% 12879 Â 11% perf-stat.ps.node-store-misses
> > 14552 Â 12% +53.3% 22311 Â 14% perf-stat.ps.node-stores
> > 2755 -1.5% 2713 perf-stat.ps.page-faults
> > 3.414e+13 -7.1% 3.172e+13 perf-stat.total.instructions
> > 69181 -1.7% 67973 proc-vmstat.nr_active_anon
> > 6177 +4.6% 6461 proc-vmstat.nr_mapped
> > 109.50 Â159% -100.0% 0.00 proc-vmstat.nr_mlock
> > 69181 -1.7% 67973 proc-vmstat.nr_zone_active_anon
> > 17125 Â 10% -55.2% 7668 Â 20% proc-vmstat.numa_hint_faults
> > 4847 Â 80% -81.0% 920.75 Â 85% proc-vmstat.numa_hint_faults_local
> > 799546 -5.8% 753399 proc-vmstat.numa_hit
> > 785482 -5.9% 739309 proc-vmstat.numa_local
> > 63641 Â 34% -57.8% 26863 Â 93% proc-vmstat.numa_pte_updates
> > 861155 -5.6% 812594 proc-vmstat.pgalloc_normal
> > 922986 -6.5% 863086 proc-vmstat.pgfault
> > 833697 -5.8% 785608 proc-vmstat.pgfree
> > 17.88 Â 5% -4.2 13.72 Â 21% perf-profile.calltrace.cycles-
> > pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_tim
> > er.__hrtimer_run_queues
> > 8.65 Â 10% -2.8 5.83 Â 26% perf-profile.calltrace.cycles-
> > pp.clockevents_program_event.hrtimer_interrupt.smp_apic_timer_interrup
> > t.apic_timer_interrupt
> > 3.81 Â 39% -1.9 1.89 Â 30% perf-profile.calltrace.cycles-
> > pp.ktime_get.clockevents_program_event.hrtimer_interrupt.smp_apic_tim
> > er_interrupt.apic_timer_interrupt
> > 4.05 Â 6% -1.3 2.72 Â 25% perf-profile.calltrace.cycles-
> > pp.ktime_get_update_offsets_now.hrtimer_interrupt.smp_apic_timer_inte
> > rrupt.apic_timer_interrupt
> > 3.42 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> > pp.irq_work_interrupt
> > 3.42 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> > pp.smp_irq_work_interrupt.irq_work_interrupt
> > 3.40 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> > pp.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_work_inter
> > rupt
> > 3.40 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> > pp.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
> > 3.40 Â100% -1.1 2.27 Â173% perf-profile.calltrace.cycles-
> > pp.printk.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_wor
> > k_interrupt
> > 3.40 Â100% -1.1 2.27 Â173% perf-profile.calltrace.cycles-
> > pp.vprintk_emit.printk.irq_work_run_list.irq_work_run.smp_irq_work_inter
> > rupt
> > 3.40 Â100% -1.1 2.27 Â173% perf-profile.calltrace.cycles-
> > pp.console_unlock.vprintk_emit.printk.irq_work_run_list.irq_work_run
> > 3.38 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> > pp.secondary_startup_64
> > 3.38 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> > pp.start_secondary.secondary_startup_64
> > 3.38 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> > pp.cpu_startup_entry.start_secondary.secondary_startup_64
> > 3.37 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> > pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> > 3.33 Â100% -1.0 2.29 Â173% perf-profile.calltrace.cycles-
> > pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.second
> > ary_startup_64
> > 3.29 Â101% -1.0 2.29 Â173% perf-profile.calltrace.cycles-
> > pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_seconda
> > ry
> > 2.59 Â 13% -0.8 1.78 Â 20% perf-profile.calltrace.cycles-
> > pp.__update_load_avg_se.task_tick_fair.scheduler_tick.update_process_ti
> > mes.tick_sched_handle
> > 1.37 Â 11% -0.7 0.65 Â 64% perf-profile.calltrace.cycles-
> > pp.timerqueue_del.__remove_hrtimer.__hrtimer_run_queues.hrtimer_int
> > errupt.smp_apic_timer_interrupt
> > 2.93 Â101% -0.7 2.21 Â173% perf-profile.calltrace.cycles-
> > pp.serial8250_console_write.console_unlock.vprintk_emit.printk.irq_work_r
> > un_list
> > 2.04 Â 13% -0.7 1.35 Â 30% perf-profile.calltrace.cycles-
> > pp.update_curr.task_tick_fair.scheduler_tick.update_process_times.tick_sc
> > hed_handle
> > 0.80 Â 16% -0.5 0.30 Â100% perf-profile.calltrace.cycles-
> > pp._raw_spin_lock_irq.__hrtimer_run_queues.hrtimer_interrupt.smp_apic
> > _timer_interrupt.apic_timer_interrupt
> > 0.00 +5.5e+11% 5530 Â110%
> > sched_debug.cfs_rq:/.MIN_vruntime.avg
> > 12338 +45.3% 17924 Â 7% sched_debug.cfs_rq:/.load.avg
> > 39464 Â 43% +709.9% 319609 Â 46% sched_debug.cfs_rq:/.load.max
> > 5132 Â 59% +704.6% 41295 Â 41% sched_debug.cfs_rq:/.load.stddev
> > 299.73 Â 28% +60.1% 479.92 Â 19%
> > sched_debug.cfs_rq:/.load_avg.max
> > 52.90 Â 37% +48.5% 78.55 Â 11%
> > sched_debug.cfs_rq:/.load_avg.stddev
> > 0.00 +5.5e+11% 5530 Â110%
> > sched_debug.cfs_rq:/.max_vruntime.avg
> > 56400 Â 32% +258.5% 202169 Â 5%
> > sched_debug.cfs_rq:/.min_vruntime.stddev
> > 0.08 Â 13% +26.3% 0.10 Â 11%
> > sched_debug.cfs_rq:/.nr_running.stddev
> > 11.16 Â 6% +26.1% 14.08 Â 4%
> > sched_debug.cfs_rq:/.runnable_load_avg.avg
> > 34.70 Â 30% +486.6% 203.54 Â 30%
> > sched_debug.cfs_rq:/.runnable_load_avg.max
> > 3.74 Â 31% +599.1% 26.17 Â 30%
> > sched_debug.cfs_rq:/.runnable_load_avg.stddev
> > 12213 +46.1% 17837 Â 7%
> > sched_debug.cfs_rq:/.runnable_weight.avg
> > 35964 Â 43% +788.0% 319358 Â 46%
> > sched_debug.cfs_rq:/.runnable_weight.max
> > 4732 Â 62% +770.0% 41175 Â 42%
> > sched_debug.cfs_rq:/.runnable_weight.stddev
> > 40678 Â144% +3754.8% 1568088 Â 2%
> > sched_debug.cfs_rq:/.spread0.avg
> > 127330 Â 55% +1238.0% 1703696 Â 3%
> > sched_debug.cfs_rq:/.spread0.max
> > -87205 -98.1% -1677 sched_debug.cfs_rq:/.spread0.min
> > 56217 Â 32% +259.5% 202080 Â 5%
> > sched_debug.cfs_rq:/.spread0.stddev
> > 698851 Â 8% -16.4% 584422 Â 9% sched_debug.cpu.avg_idle.min
> > 165933 Â 7% +13.4% 188123 sched_debug.cpu.clock.avg
> > 165941 Â 7% +13.4% 188129 sched_debug.cpu.clock.max
> > 165919 Â 7% +13.4% 188115 sched_debug.cpu.clock.min
> > 5.35 Â 23% -32.9% 3.59 Â 7% sched_debug.cpu.clock.stddev
> > 165933 Â 7% +13.4% 188123 sched_debug.cpu.clock_task.avg
> > 165941 Â 7% +13.4% 188129 sched_debug.cpu.clock_task.max
> > 165919 Â 7% +13.4% 188115 sched_debug.cpu.clock_task.min
> > 5.35 Â 23% -32.9% 3.59 Â 7% sched_debug.cpu.clock_task.stddev
> > 11.33 Â 4% +19.3% 13.51 Â 9% sched_debug.cpu.cpu_load[0].avg
> > 38.97 Â 28% +418.2% 201.92 Â 30% sched_debug.cpu.cpu_load[0].max
> > 4.46 Â 41% +443.0% 24.23 Â 33%
> > sched_debug.cpu.cpu_load[0].stddev
> > 11.43 Â 4% +17.6% 13.44 Â 8% sched_debug.cpu.cpu_load[1].avg
> > 37.25 Â 26% +418.9% 193.29 Â 30% sched_debug.cpu.cpu_load[1].max
> > 4.08 Â 36% +463.5% 23.00 Â 35%
> > sched_debug.cpu.cpu_load[1].stddev
> > 11.49 Â 4% +16.2% 13.34 Â 7% sched_debug.cpu.cpu_load[2].avg
> > 35.41 Â 21% +398.6% 176.54 Â 30% sched_debug.cpu.cpu_load[2].max
> > 3.74 Â 31% +460.0% 20.96 Â 34%
> > sched_debug.cpu.cpu_load[2].stddev
> > 34.54 Â 14% +336.3% 150.71 Â 21% sched_debug.cpu.cpu_load[3].max
> > 3.51 Â 22% +407.6% 17.84 Â 27%
> > sched_debug.cpu.cpu_load[3].stddev
> > 55.55 +164.1% 146.71 Â 15% sched_debug.cpu.cpu_load[4].max
> > 5.57 Â 6% +202.7% 16.85 Â 18% sched_debug.cpu.cpu_load[4].stddev
> > 4396 Â 6% +13.1% 4974 Â 7% sched_debug.cpu.curr->pid.avg
> > 3517 Â 17% -47.9% 1832 Â 34% sched_debug.cpu.curr->pid.min
> > 305.14 Â 37% +98.3% 604.98 Â 22% sched_debug.cpu.curr->pid.stddev
> > 12364 +39.5% 17254 Â 14% sched_debug.cpu.load.avg
> > 39007 Â 44% +721.4% 320411 Â 46% sched_debug.cpu.load.max
> > 5044 Â 63% +681.3% 39414 Â 48% sched_debug.cpu.load.stddev
> > 133158 Â 9% +16.9% 155714
> > sched_debug.cpu.nr_load_updates.avg
> > 140928 Â 9% +15.6% 162887
> > sched_debug.cpu.nr_load_updates.max
> > 1369 Â 37% +66.8% 2283 Â 4%
> > sched_debug.cpu.nr_load_updates.stddev
> > 0.19 Â 3% +22.7% 0.23 Â 15% sched_debug.cpu.nr_running.stddev
> > 3143 Â 6% +52.3% 4788 Â 3% sched_debug.cpu.nr_switches.avg
> > 11472 Â 19% +67.3% 19197 Â 3% sched_debug.cpu.nr_switches.max
> > 2464 Â 19% +73.5% 4275 Â 6% sched_debug.cpu.nr_switches.stddev
> > 165920 Â 7% +13.4% 188116 sched_debug.cpu_clk
> > 162590 Â 7% +13.6% 184783 sched_debug.ktime
> > 167206 Â 6% +13.6% 189868 sched_debug.sched_clk
> >
> >
> >
> > ftq.time.involuntary_context_switches
> >
> > 260000 +-+----------------------------------------------------------------+
> > |O |
> > 240000 +-+ O |
> > 220000 O-+ O O |
> > | O O O O O |
> > 200000 +-+ O |
> > 180000 +-+ O OO OO OO O OOO O OO OO OO OO
> > | O O O |
> > 160000 +-+ O OOO O O O O O O |
> > 140000 +-+ O O O |
> > | |
> > 120000 +-+ + |
> > 100000 +-+ +. +. .+ :+ |
> > |+.+ ++.++.++.++ ++.++ +.+ ++.+++.++.++.++.++.+++.++ |
> > 80000 +-+----------------------------------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> > **********************************************************
> > *****************************************
> > lkp-hsw-ep2: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with
> > 128G memory
> > ==========================================================
> > ===============================
> > compiler/kconfig/rootfs/tbox_group/testcase/type/ucode:
> > gcc-7/x86_64-rhel-7.2/debian-x86_64-2018-04-03.cgz/lkp-hsw-
> > ep2/perf_test/lkp/0x3d
> >
> > commit:
> > 73a049a90f ("ACPICA: disassembler: disassemble OEMx tables as AML")
> > 4c1379d7bb ("ACPICA: Debug output: Add option to display method/object
> > evaluation")
> >
> > 73a049a90fb241af 4c1379d7bb42fa664e07845392
> > ---------------- --------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :8 100% 8:8 perf_test.msr_test.round1.cpu_util_zero.fail
> > %stddev %change %stddev
> > \ | \
> > 1.00 -100.0% 0.00
> > perf_test.msr_test.round1.cpu_util_zero.pass
> > 1432 +13.1% 1620 Â 3% perf_test.time.elapsed_time
> > 1432 +13.1% 1620 Â 3% perf_test.time.elapsed_time.max
> > 43385 -6.9% 40403 Â 2%
> > perf_test.time.involuntary_context_switches
> > 76.62 Â 2% +16.3% 89.12 Â 2%
> > perf_test.time.percent_of_cpu_this_job_got
> > 295.24 +17.2% 346.06 Â 13% perf_test.time.system_time
> > 810.50 +36.5% 1106 Â 2% perf_test.time.user_time
> > 360596 +43.1% 515953 Â 17%
> > perf_test.time.voluntary_context_switches
> > 16568 +46.1% 24206 meminfo.Inactive(anon)
> > 30398 +16.8% 35515 meminfo.Mapped
> > 1843 Â 10% +9.1% 2010 Â 7% meminfo.Mlocked
> > 19473 +39.3% 27124 meminfo.Shmem
> > 2897 -11.6% 2561 Â 3% meminfo.max_used_kB
> >
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are
> > provided
> > for informational purposes only. Any difference in system hardware or
> > software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Rong Chen