Re: [PATCH 2/3] perf/x86/pebs: add workaround for broken OVFL status on HSW

From: Stephane Eranian
Date: Thu Dec 15 2016 - 03:12:31 EST


On Wed, Dec 14, 2016 at 11:52 PM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
>> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> >
>> > Just spotted this again, ping?
>> >
>> Ok, on what processor running what command, so I can try and reproduce?
>
> for me it's snb_x (model 45) and peter's ivb-ep model 62
>
> after several hours of fuzzer test, log below.. I'll try again with the change
>
Ok, but the problem with the fuzzer is hat you have no idea whether
you were using PEBS, no-PEBS one or multiple events, so it becomes
hard to reproduce.

> jirka
>
>
> ---
> [14404.947844] perfevents: irq loop stuck!
> [14404.952560] ------------[ cut here ]------------
> [14404.957720] WARNING: CPU: 0 PID: 0 at arch/x86/events/intel/core.c:2093 intel_pmu_handle_irq+0x2f8/0x4c0
> [14404.968305] Modules linked in:\x01c intel_rapl\x01c sb_edac\x01c edac_core\x01c x86_pkg_temp_thermal\x01c intel_powerclamp\x01c coretemp
> \x01c ipmi_devintf\x01c crct10dif_pclmul\x01c crc32_pclmul\x01c iTCO_wdt\x01c iTCO_vendor_support\x01c ghash_clmulni_intel\x01c pcspkr\x01c
> ipmi_ssif\x01c tpm_tis\x01c i2c_i801\x01c tpm_tis_core\x01c ipmi_si\x01c tpm\x01c i2c_smbus\x01c ipmi_msghandler\x01c cdc_ether\x01c usbne
> t\x01c mii\x01c shpchp\x01c ioatdma\x01c wmi\x01c lpc_ich\x01c xfs\x01c libcrc32c\x01c mgag200\x01c drm_kms_helper\x01c ttm\x01c drm\x01c i
> gb\x01c ptp\x01c crc32c_intel\x01c pps_core\x01c dca\x01c i2c_algo_bit\x01c megaraid_sas\x01c fjes\x01c
> [14405.019901] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc8+ #51
> [14405.026985] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012
> [14405.037568] ffff880277a05b08\x01c ffffffff81463243\x01c ffff880277a05b58\x01c 0000000000000000\x01c
> [14405.046601] ffff880277a05b48\x01c ffffffff810b698b\x01c 0000082d81133a1d\x01c 0000000000000064\x01c
> [14405.055634] ffff880277a0a380\x01c ffff880276208800\x01c 0000000000000040\x01c ffff880277a0a580\x01c
> [14405.064665] Call Trace:
> [14405.067394] <NMI> [<ffffffff81463243>] dump_stack+0x86/0xc3
> [14405.073807] [<ffffffff810b698b>] __warn+0xcb/0xf0
> [14405.079156] [<ffffffff810b6a0f>] warn_slowpath_fmt+0x5f/0x80
> [14405.085569] [<ffffffff810b69b5>] ? warn_slowpath_fmt+0x5/0x80
> [14405.092081] [<ffffffff8100d448>] intel_pmu_handle_irq+0x2f8/0x4c0
> [14405.098971] [<ffffffff810060dc>] ? perf_event_nmi_handler+0x2c/0x50
> [14405.106065] [<ffffffff8100d150>] ? intel_pmu_save_and_restart+0x50/0x50
> [14405.113547] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
> [14405.120737] [<ffffffff811a2555>] ? ftrace_ops_test.isra.23+0x65/0xa0
> [14405.127917] [<ffffffff8147a2ee>] ? bsearch+0x5e/0x90
> [14405.133556] [<ffffffff811a0f50>] ? __add_hash_entry+0x50/0x50
> [14405.140066] [<ffffffff8147a2ee>] ? bsearch+0x5e/0x90
> [14405.145704] [<ffffffff811a0f50>] ? __add_hash_entry+0x50/0x50
> [14405.152214] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
> [14405.159403] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
> [14405.166594] [<ffffffff81133a1d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
> [14405.173979] [<ffffffff811a31be>] ? ftrace_ops_list_func+0xce/0x1d0
> [14405.180974] [<ffffffff818e2957>] ? ftrace_call+0x5/0x34
> [14405.186904] [<ffffffff818e2957>] ? ftrace_call+0x5/0x34
> [14405.192824] [<ffffffff81129f00>] ? printk_nmi_enter+0x20/0x20
> [14405.199337] [<ffffffff8100d155>] ? intel_pmu_handle_irq+0x5/0x4c0
> [14405.206235] [<ffffffff810060b5>] ? perf_event_nmi_handler+0x5/0x50
> [14405.213231] [<ffffffff810060dc>] perf_event_nmi_handler+0x2c/0x50
> [14405.220121] [<ffffffff8103983d>] nmi_handle+0xbd/0x2e0
> [14405.225954] [<ffffffff81039785>] ? nmi_handle+0x5/0x2e0
> [14405.231875] [<ffffffff81039785>] ? nmi_handle+0x5/0x2e0
> [14405.237804] [<ffffffff81039f13>] default_do_nmi+0x53/0x100
> [14405.244025] [<ffffffff8103a0df>] do_nmi+0x11f/0x170
> [14405.249557] [<ffffffff818e26f1>] end_repeat_nmi+0x1a/0x1e
> [14405.255680] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
> [14405.262093] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
> [14405.268507] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
> [14405.274914] <EOE> <IRQ> [<ffffffff810117e4>] ? intel_pmu_pebs_enable_all+0x34/0x40
> [14405.283656] [<ffffffff8100cc43>] __intel_pmu_enable_all.constprop.17+0x23/0xa0
> [14405.291815] [<ffffffff8100ccd0>] intel_pmu_enable_all+0x10/0x20
> [14405.298520] [<ffffffff81007836>] x86_pmu_enable+0x256/0x2e0
> [14405.304836] [<ffffffff811e32e7>] perf_pmu_enable.part.86+0x7/0x10
> [14405.311736] [<ffffffff811e73be>] perf_mux_hrtimer_handler+0x22e/0x2c0
> [14405.319014] [<ffffffff811418bb>] __hrtimer_run_queues+0xfb/0x510
> [14405.325808] [<ffffffff811e7190>] ? ctx_resched+0x90/0x90
> [14405.331834] [<ffffffff811424bd>] hrtimer_interrupt+0x9d/0x1a0
> [14405.338343] [<ffffffff8105d958>] local_apic_timer_interrupt+0x38/0x60
> [14405.345629] [<ffffffff818e3b3b>] smp_trace_apic_timer_interrupt+0x5b/0x25f
> [14405.353402] [<ffffffff818e2ce6>] trace_apic_timer_interrupt+0x96/0xa0
> [14405.360689] <EOI> [<ffffffff8172f414>] ? cpuidle_enter_state+0x124/0x380
> [14405.368354] [<ffffffff8172f410>] ? cpuidle_enter_state+0x120/0x380
> [14405.375349] [<ffffffff8172f6a7>] cpuidle_enter+0x17/0x20
> [14405.381375] [<ffffffff81109013>] call_cpuidle+0x23/0x40
> [14405.387303] [<ffffffff811092a0>] cpu_startup_entry+0x160/0x250
> [14405.393910] [<ffffffff818d0725>] rest_init+0x135/0x140
> [14405.399743] [<ffffffff81facff9>] start_kernel+0x45e/0x47f
> [14405.405866] [<ffffffff81fac120>] ? early_idt_handler_array+0x120/0x120
> [14405.413250] [<ffffffff81fac2d6>] x86_64_start_reservations+0x2a/0x2c
> [14405.420432] [<ffffffff81fac424>] x86_64_start_kernel+0x14c/0x16f
> [14405.427224] ---[ end trace 62b08c15aaa2825d ]---
> [14405.432378]
> [14405.434043] CPU#0: ctrl: 0000000000000000
> [14405.439099] CPU#0: status: 0000000000000008
> [14405.444157] CPU#0: overflow: 0000000000000000
> [14405.449214] CPU#0: fixed: 00000000000000b0
> [14405.454271] CPU#0: pebs: 0000000000000000
> [14405.459326] CPU#0: debugctl: 0000000000000000
> [14405.464383] CPU#0: active: 000000020000000f
> [14405.469431] CPU#0: gen-PMC0 ctrl: 0000000001d301b1
> [14405.475069] CPU#0: gen-PMC0 count: 0000800090b1c37e
> [14405.480706] CPU#0: gen-PMC0 left: 00007fff6fb96d3a
> [14405.486344] CPU#0: gen-PMC1 ctrl: 00000000baf733b1
> [14405.491981] CPU#0: gen-PMC1 count: 0000800000000009
> [14405.497618] CPU#0: gen-PMC1 left: 00007ffffffffff7
> [14405.503256] CPU#0: gen-PMC2 ctrl: 0000000000530020
> [14405.508894] CPU#0: gen-PMC2 count: 00008000000000e8
> [14405.514534] CPU#0: gen-PMC2 left: 00007fffffffff18
> [14405.520172] CPU#0: gen-PMC3 ctrl: 00000000004200c0
> [14405.525809] CPU#0: gen-PMC3 count: 0000fffffffffffe
> [14405.531446] CPU#0: gen-PMC3 left: 0000000000000002
> [14405.537085] CPU#0: fixed-PMC0 count: 000080000010c91d
> [14405.542722] CPU#0: fixed-PMC1 count: 0000fffc1b31bacf
> [14405.548360] CPU#0: fixed-PMC2 count: 000080000318bf99
> [14405.554000] core: clearing PMU state on CPU#0
> [14405.559598] core: clearing PMU state on CPU#0