RE: [LKP] [ACPICA] 4c1379d7bb: reaim.jobs_per_min -2.8% regression

From: Schmauss, Erik
Date: Tue Feb 19 2019 - 19:34:56 EST




> -----Original Message-----
> From: Chen, Rong A
> Sent: Sunday, February 17, 2019 5:36 PM
> To: Moore, Robert <robert.moore@xxxxxxxxx>
> Cc: Wysocki, Rafael J <rafael.j.wysocki@xxxxxxxxx>; Schmauss, Erik
> <erik.schmauss@xxxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; Linus
> Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>; lkp@xxxxxx
> Subject: [LKP] [ACPICA] 4c1379d7bb: reaim.jobs_per_min -2.8% regression
>
> Greeting,
>
> FYI, we noticed a -2.8% regression of reaim.jobs_per_min due to commit:
>
>
> commit: 4c1379d7bb42fa664e0784539208ed74108c53f1 ("ACPICA: Debug
> output: Add option to display method/object evaluation")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Hi,

Does the performance get back to normal when you apply the following changes? I've added this as an attachment as well...

diff --git a/drivers/acpi/acpica/dsmethod.c b/drivers/acpi/acpica/dsmethod.c
index f59b4d944f7f..b524b4273d9f 100644
--- a/drivers/acpi/acpica/dsmethod.c
+++ b/drivers/acpi/acpica/dsmethod.c
@@ -552,9 +552,11 @@ acpi_ds_call_control_method(struct acpi_thread_state *thread,
"**** Begin nested execution of [%4.4s] **** WalkState=%p\n",
method_node->name.ascii, next_walk_state));

+#if 0
this_walk_state->method_pathname =
acpi_ns_get_normalized_pathname(method_node, TRUE);
this_walk_state->method_is_nested = TRUE;
+#endif

/* Optional object evaluation log */

diff --git a/drivers/acpi/acpica/psparse.c b/drivers/acpi/acpica/psparse.c
index 9b386530ffbe..c39db1a83222 100644
--- a/drivers/acpi/acpica/psparse.c
+++ b/drivers/acpi/acpica/psparse.c
@@ -483,6 +483,7 @@ acpi_status acpi_ps_parse_aml(struct acpi_walk_state *walk_state)

/* Optional object evaluation log */

+#if 0
ACPI_DEBUG_PRINT_RAW((ACPI_DB_EVALUATION,
"%-26s: %*s%s\n",
" Exit nested method",
@@ -492,6 +493,7 @@ acpi_status acpi_ps_parse_aml(struct acpi_walk_state *walk_state)
&walk_state->method_pathname[1]));

ACPI_FREE(walk_state->method_pathname);
+#endif
walk_state->method_is_nested = FALSE;
}
if (status == AE_CTRL_TRANSFER) {
>
> in testcase: ftq
> on test machine: 256 threads Phi with 96G memory with following
> parameters:
>
> nr_task: 100%
> samples: 6000ss
> test: cache
> freq: 20
> cpufreq_governor: performance
>
> test-description: The FTQ benchmarks measure hardware and software
> interference or 'noise' on a node from the applications perspective.
> test-url: https://github.com/rminnich/ftq
>
> In addition to that, the commit also has significant impact on the following
> tests:
>
> +------------------+------------------------------------------------------------------------
> ---+
> | testcase: change | perf_test:
> perf_test.msr_test.round1.cpu_util_zero.pass -100.0% undefined |
> | test machine | 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
> with 128G memory |
> | test parameters | type=lkp |
> | | ucode=0x3d |
> +------------------+------------------------------------------------------------------------
> ---+
>
>
> Details are as below:
> ----------------------------------------------------------------------------------------------
> ---->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> ==========================================================
> ===============================
> compiler/cpufreq_governor/kconfig/nr_job/nr_task/rootfs/runtime/tbox_g
> roup/test/testcase/ucode:
> gcc-7/performance/x86_64-rhel-7.2/1500/100%/debian-x86_64-2018-04-
> 03.cgz/300s/lkp-hsw-ep2/all_utime/reaim/0x3d
>
> commit:
> 73a049a90f ("ACPICA: disassembler: disassemble OEMx tables as AML")
> 4c1379d7bb ("ACPICA: Debug output: Add option to display method/object
> evaluation")
>
> 73a049a90fb241af 4c1379d7bb42fa664e07845392
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 0.07 Â 3% +20.2% 0.09 Â 12% reaim.child_systime
> 308866 -2.8% 300200 reaim.jobs_per_min
> 4289 -2.8% 4169 reaim.jobs_per_min_child
> 314258 -2.7% 305679 reaim.max_jobs_per_min
> 20.93 +2.9% 21.53 reaim.parent_time
> 0.44 Â 9% +84.5% 0.82 Â 6% reaim.std_dev_percent
> 0.09 Â 9% +83.7% 0.17 Â 6% reaim.std_dev_time
> 321.38 -4.7% 306.29 reaim.time.elapsed_time
> 321.38 -4.7% 306.29 reaim.time.elapsed_time.max
> 127264 Â 2% +40.6% 178887
> reaim.time.involuntary_context_switches
> 93024 Â 2% -8.4% 85173 reaim.time.minor_page_faults
> 6448 -2.6% 6282 reaim.time.percent_of_cpu_this_job_got
> 20724 -7.2% 19241 reaim.time.user_time
> 4929 -6.9% 4591 reaim.time.voluntary_context_switches
> 1512000 -7.1% 1404000 reaim.workload
> 442.00 Â158% -100.0% 0.00 meminfo.Mlocked
> 0.03 Â 2% +0.1 0.11 mpstat.cpu.sys%
> 1231 +56.9% 1932 Â 3% vmstat.system.cs
> 63.75 Â158% -100.0% 0.00 numa-vmstat.node0.nr_mlock
> 46.25 Â159% -100.0% 0.00 numa-vmstat.node1.nr_mlock
> 1.50 Â 15% -0.3 1.24 Â 15% perf-stat.overall.cache-miss-rate%
> 50.13 Â 3% +6.4 56.58 Â 5% perf-stat.overall.iTLB-load-miss-rate%
> 566199 Â 6% -23.7% 432237 Â 7% perf-stat.overall.instructions-per-
> iTLB-miss
> 59.86 Â 3% -23.1 36.79 Â 10% perf-stat.overall.node-store-miss-rate%
> 6653969 Â 8% +25.4% 8340919 Â 14% perf-stat.ps.cache-references
> 1202 +56.7% 1884 Â 4% perf-stat.ps.context-switches
> 185718 Â 5% +28.9% 239386 Â 7% perf-stat.ps.iTLB-load-misses
> 2755 -1.5% 2713 perf-stat.ps.minor-faults
> 22055 Â 22% -41.6% 12879 Â 11% perf-stat.ps.node-store-misses
> 14552 Â 12% +53.3% 22311 Â 14% perf-stat.ps.node-stores
> 2755 -1.5% 2713 perf-stat.ps.page-faults
> 3.414e+13 -7.1% 3.172e+13 perf-stat.total.instructions
> 69181 -1.7% 67973 proc-vmstat.nr_active_anon
> 6177 +4.6% 6461 proc-vmstat.nr_mapped
> 109.50 Â159% -100.0% 0.00 proc-vmstat.nr_mlock
> 69181 -1.7% 67973 proc-vmstat.nr_zone_active_anon
> 17125 Â 10% -55.2% 7668 Â 20% proc-vmstat.numa_hint_faults
> 4847 Â 80% -81.0% 920.75 Â 85% proc-vmstat.numa_hint_faults_local
> 799546 -5.8% 753399 proc-vmstat.numa_hit
> 785482 -5.9% 739309 proc-vmstat.numa_local
> 63641 Â 34% -57.8% 26863 Â 93% proc-vmstat.numa_pte_updates
> 861155 -5.6% 812594 proc-vmstat.pgalloc_normal
> 922986 -6.5% 863086 proc-vmstat.pgfault
> 833697 -5.8% 785608 proc-vmstat.pgfree
> 17.88 Â 5% -4.2 13.72 Â 21% perf-profile.calltrace.cycles-
> pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_tim
> er.__hrtimer_run_queues
> 8.65 Â 10% -2.8 5.83 Â 26% perf-profile.calltrace.cycles-
> pp.clockevents_program_event.hrtimer_interrupt.smp_apic_timer_interrup
> t.apic_timer_interrupt
> 3.81 Â 39% -1.9 1.89 Â 30% perf-profile.calltrace.cycles-
> pp.ktime_get.clockevents_program_event.hrtimer_interrupt.smp_apic_tim
> er_interrupt.apic_timer_interrupt
> 4.05 Â 6% -1.3 2.72 Â 25% perf-profile.calltrace.cycles-
> pp.ktime_get_update_offsets_now.hrtimer_interrupt.smp_apic_timer_inte
> rrupt.apic_timer_interrupt
> 3.42 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> pp.irq_work_interrupt
> 3.42 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> pp.smp_irq_work_interrupt.irq_work_interrupt
> 3.40 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> pp.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_work_inter
> rupt
> 3.40 Â100% -1.3 2.11 Â173% perf-profile.calltrace.cycles-
> pp.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
> 3.40 Â100% -1.1 2.27 Â173% perf-profile.calltrace.cycles-
> pp.printk.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_wor
> k_interrupt
> 3.40 Â100% -1.1 2.27 Â173% perf-profile.calltrace.cycles-
> pp.vprintk_emit.printk.irq_work_run_list.irq_work_run.smp_irq_work_inter
> rupt
> 3.40 Â100% -1.1 2.27 Â173% perf-profile.calltrace.cycles-
> pp.console_unlock.vprintk_emit.printk.irq_work_run_list.irq_work_run
> 3.38 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> pp.secondary_startup_64
> 3.38 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> pp.start_secondary.secondary_startup_64
> 3.38 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> pp.cpu_startup_entry.start_secondary.secondary_startup_64
> 3.37 Â100% -1.1 2.30 Â173% perf-profile.calltrace.cycles-
> pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 3.33 Â100% -1.0 2.29 Â173% perf-profile.calltrace.cycles-
> pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.second
> ary_startup_64
> 3.29 Â101% -1.0 2.29 Â173% perf-profile.calltrace.cycles-
> pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_seconda
> ry
> 2.59 Â 13% -0.8 1.78 Â 20% perf-profile.calltrace.cycles-
> pp.__update_load_avg_se.task_tick_fair.scheduler_tick.update_process_ti
> mes.tick_sched_handle
> 1.37 Â 11% -0.7 0.65 Â 64% perf-profile.calltrace.cycles-
> pp.timerqueue_del.__remove_hrtimer.__hrtimer_run_queues.hrtimer_int
> errupt.smp_apic_timer_interrupt
> 2.93 Â101% -0.7 2.21 Â173% perf-profile.calltrace.cycles-
> pp.serial8250_console_write.console_unlock.vprintk_emit.printk.irq_work_r
> un_list
> 2.04 Â 13% -0.7 1.35 Â 30% perf-profile.calltrace.cycles-
> pp.update_curr.task_tick_fair.scheduler_tick.update_process_times.tick_sc
> hed_handle
> 0.80 Â 16% -0.5 0.30 Â100% perf-profile.calltrace.cycles-
> pp._raw_spin_lock_irq.__hrtimer_run_queues.hrtimer_interrupt.smp_apic
> _timer_interrupt.apic_timer_interrupt
> 0.00 +5.5e+11% 5530 Â110%
> sched_debug.cfs_rq:/.MIN_vruntime.avg
> 12338 +45.3% 17924 Â 7% sched_debug.cfs_rq:/.load.avg
> 39464 Â 43% +709.9% 319609 Â 46% sched_debug.cfs_rq:/.load.max
> 5132 Â 59% +704.6% 41295 Â 41% sched_debug.cfs_rq:/.load.stddev
> 299.73 Â 28% +60.1% 479.92 Â 19%
> sched_debug.cfs_rq:/.load_avg.max
> 52.90 Â 37% +48.5% 78.55 Â 11%
> sched_debug.cfs_rq:/.load_avg.stddev
> 0.00 +5.5e+11% 5530 Â110%
> sched_debug.cfs_rq:/.max_vruntime.avg
> 56400 Â 32% +258.5% 202169 Â 5%
> sched_debug.cfs_rq:/.min_vruntime.stddev
> 0.08 Â 13% +26.3% 0.10 Â 11%
> sched_debug.cfs_rq:/.nr_running.stddev
> 11.16 Â 6% +26.1% 14.08 Â 4%
> sched_debug.cfs_rq:/.runnable_load_avg.avg
> 34.70 Â 30% +486.6% 203.54 Â 30%
> sched_debug.cfs_rq:/.runnable_load_avg.max
> 3.74 Â 31% +599.1% 26.17 Â 30%
> sched_debug.cfs_rq:/.runnable_load_avg.stddev
> 12213 +46.1% 17837 Â 7%
> sched_debug.cfs_rq:/.runnable_weight.avg
> 35964 Â 43% +788.0% 319358 Â 46%
> sched_debug.cfs_rq:/.runnable_weight.max
> 4732 Â 62% +770.0% 41175 Â 42%
> sched_debug.cfs_rq:/.runnable_weight.stddev
> 40678 Â144% +3754.8% 1568088 Â 2%
> sched_debug.cfs_rq:/.spread0.avg
> 127330 Â 55% +1238.0% 1703696 Â 3%
> sched_debug.cfs_rq:/.spread0.max
> -87205 -98.1% -1677 sched_debug.cfs_rq:/.spread0.min
> 56217 Â 32% +259.5% 202080 Â 5%
> sched_debug.cfs_rq:/.spread0.stddev
> 698851 Â 8% -16.4% 584422 Â 9% sched_debug.cpu.avg_idle.min
> 165933 Â 7% +13.4% 188123 sched_debug.cpu.clock.avg
> 165941 Â 7% +13.4% 188129 sched_debug.cpu.clock.max
> 165919 Â 7% +13.4% 188115 sched_debug.cpu.clock.min
> 5.35 Â 23% -32.9% 3.59 Â 7% sched_debug.cpu.clock.stddev
> 165933 Â 7% +13.4% 188123 sched_debug.cpu.clock_task.avg
> 165941 Â 7% +13.4% 188129 sched_debug.cpu.clock_task.max
> 165919 Â 7% +13.4% 188115 sched_debug.cpu.clock_task.min
> 5.35 Â 23% -32.9% 3.59 Â 7% sched_debug.cpu.clock_task.stddev
> 11.33 Â 4% +19.3% 13.51 Â 9% sched_debug.cpu.cpu_load[0].avg
> 38.97 Â 28% +418.2% 201.92 Â 30% sched_debug.cpu.cpu_load[0].max
> 4.46 Â 41% +443.0% 24.23 Â 33%
> sched_debug.cpu.cpu_load[0].stddev
> 11.43 Â 4% +17.6% 13.44 Â 8% sched_debug.cpu.cpu_load[1].avg
> 37.25 Â 26% +418.9% 193.29 Â 30% sched_debug.cpu.cpu_load[1].max
> 4.08 Â 36% +463.5% 23.00 Â 35%
> sched_debug.cpu.cpu_load[1].stddev
> 11.49 Â 4% +16.2% 13.34 Â 7% sched_debug.cpu.cpu_load[2].avg
> 35.41 Â 21% +398.6% 176.54 Â 30% sched_debug.cpu.cpu_load[2].max
> 3.74 Â 31% +460.0% 20.96 Â 34%
> sched_debug.cpu.cpu_load[2].stddev
> 34.54 Â 14% +336.3% 150.71 Â 21% sched_debug.cpu.cpu_load[3].max
> 3.51 Â 22% +407.6% 17.84 Â 27%
> sched_debug.cpu.cpu_load[3].stddev
> 55.55 +164.1% 146.71 Â 15% sched_debug.cpu.cpu_load[4].max
> 5.57 Â 6% +202.7% 16.85 Â 18% sched_debug.cpu.cpu_load[4].stddev
> 4396 Â 6% +13.1% 4974 Â 7% sched_debug.cpu.curr->pid.avg
> 3517 Â 17% -47.9% 1832 Â 34% sched_debug.cpu.curr->pid.min
> 305.14 Â 37% +98.3% 604.98 Â 22% sched_debug.cpu.curr->pid.stddev
> 12364 +39.5% 17254 Â 14% sched_debug.cpu.load.avg
> 39007 Â 44% +721.4% 320411 Â 46% sched_debug.cpu.load.max
> 5044 Â 63% +681.3% 39414 Â 48% sched_debug.cpu.load.stddev
> 133158 Â 9% +16.9% 155714
> sched_debug.cpu.nr_load_updates.avg
> 140928 Â 9% +15.6% 162887
> sched_debug.cpu.nr_load_updates.max
> 1369 Â 37% +66.8% 2283 Â 4%
> sched_debug.cpu.nr_load_updates.stddev
> 0.19 Â 3% +22.7% 0.23 Â 15% sched_debug.cpu.nr_running.stddev
> 3143 Â 6% +52.3% 4788 Â 3% sched_debug.cpu.nr_switches.avg
> 11472 Â 19% +67.3% 19197 Â 3% sched_debug.cpu.nr_switches.max
> 2464 Â 19% +73.5% 4275 Â 6% sched_debug.cpu.nr_switches.stddev
> 165920 Â 7% +13.4% 188116 sched_debug.cpu_clk
> 162590 Â 7% +13.6% 184783 sched_debug.ktime
> 167206 Â 6% +13.6% 189868 sched_debug.sched_clk
>
>
>
> ftq.time.involuntary_context_switches
>
> 260000 +-+----------------------------------------------------------------+
> |O |
> 240000 +-+ O |
> 220000 O-+ O O |
> | O O O O O |
> 200000 +-+ O |
> 180000 +-+ O OO OO OO O OOO O OO OO OO OO
> | O O O |
> 160000 +-+ O OOO O O O O O O |
> 140000 +-+ O O O |
> | |
> 120000 +-+ + |
> 100000 +-+ +. +. .+ :+ |
> |+.+ ++.++.++.++ ++.++ +.+ ++.+++.++.++.++.++.+++.++ |
> 80000 +-+----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> **********************************************************
> *****************************************
> lkp-hsw-ep2: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with
> 128G memory
> ==========================================================
> ===============================
> compiler/kconfig/rootfs/tbox_group/testcase/type/ucode:
> gcc-7/x86_64-rhel-7.2/debian-x86_64-2018-04-03.cgz/lkp-hsw-
> ep2/perf_test/lkp/0x3d
>
> commit:
> 73a049a90f ("ACPICA: disassembler: disassemble OEMx tables as AML")
> 4c1379d7bb ("ACPICA: Debug output: Add option to display method/object
> evaluation")
>
> 73a049a90fb241af 4c1379d7bb42fa664e07845392
> ---------------- --------------------------
> fail:runs %reproduction fail:runs
> | | |
> :8 100% 8:8 perf_test.msr_test.round1.cpu_util_zero.fail
> %stddev %change %stddev
> \ | \
> 1.00 -100.0% 0.00
> perf_test.msr_test.round1.cpu_util_zero.pass
> 1432 +13.1% 1620 Â 3% perf_test.time.elapsed_time
> 1432 +13.1% 1620 Â 3% perf_test.time.elapsed_time.max
> 43385 -6.9% 40403 Â 2%
> perf_test.time.involuntary_context_switches
> 76.62 Â 2% +16.3% 89.12 Â 2%
> perf_test.time.percent_of_cpu_this_job_got
> 295.24 +17.2% 346.06 Â 13% perf_test.time.system_time
> 810.50 +36.5% 1106 Â 2% perf_test.time.user_time
> 360596 +43.1% 515953 Â 17%
> perf_test.time.voluntary_context_switches
> 16568 +46.1% 24206 meminfo.Inactive(anon)
> 30398 +16.8% 35515 meminfo.Mapped
> 1843 Â 10% +9.1% 2010 Â 7% meminfo.Mlocked
> 19473 +39.3% 27124 meminfo.Shmem
> 2897 -11.6% 2561 Â 3% meminfo.max_used_kB
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are
> provided
> for informational purposes only. Any difference in system hardware or
> software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen

Attachment: es-dbg-experiement-01.patch
Description: es-dbg-experiement-01.patch