Re: [RFC][PATCH 2/3] perf: Add a bit of paranoia

From: Vince Weaver
Date: Mon Feb 02 2015 - 01:32:22 EST


On Thu, 29 Jan 2015, Peter Zijlstra wrote:

> That said, it does need to do that sibling first leaders later install
> order too. So I've put the below on top.

so I've lost track of exactly which patches I should be running (do I need
to apply both of the additional patches?)

Meanwhile my haswell was still fuzzing away (without those two
updated patches) until it triggered the below and crashed.

[407484.309136] ------------[ cut here ]------------
[407484.314590] WARNING: CPU: 3 PID: 27209 at kernel/watchdog.c:290 watchdog_overflow_callback+0x92/0xc0()
[407484.325090] Watchdog detected hard LOCKUP on cpu 3
[407484.330093] Modules linked in: btrfs xor raid6_pq ntfs vfat msdos fat dm_mod fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm crct10dif_pclmul snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_controller aesni_intel snd_hda_codec aes_x86_64 snd_hwdep lrw gf128mul snd_pcm ppdev glue_helper xhci_pci mei_me iTCO_wdt iTCO_vendor_support i915 snd_timer drm_kms_helper snd drm ablk_helper lpc_ich mfd_core evdev pcspkr parport_pc psmouse cryptd soundcore i2c_i801 serio_raw parport xhci_hcd mei wmi tpm_tis tpm video battery button processor i2c_algo_bit sg sr_mod sd_mod cdrom ahci libahci e1000e ehci_pci libata ptp ehci_hcd crc32c_intel usbcore scsi_mod usb_common pps_core thermal fan thermal_sys
[407484.408496] CPU: 3 PID: 27209 Comm: perf_fuzzer Tainted: G W 3.19.0-rc6+ #126
[407484.417914] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[407484.426497] ffffffff81a3d3da ffff88011eac5aa0 ffffffff816b6761 0000000000000000
[407484.435161] ffff88011eac5af0 ffff88011eac5ae0 ffffffff8106dcda ffff88011eac5b00
[407484.443900] ffff8801195f9800 0000000000000001 ffff88011eac5c40 ffff88011eac5ef8
[407484.452588] Call Trace:
[407484.455862] <NMI> [<ffffffff816b6761>] dump_stack+0x45/0x57
[407484.462741] [<ffffffff8106dcda>] warn_slowpath_common+0x8a/0xc0
[407484.469851] [<ffffffff8106dd56>] warn_slowpath_fmt+0x46/0x50
[407484.476743] [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
[407484.483888] [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
[407484.490999] [<ffffffff8111b922>] watchdog_overflow_callback+0x92/0xc0
[407484.498672] [<ffffffff8115c3f1>] __perf_event_overflow+0x91/0x270
[407484.505984] [<ffffffff8102b15a>] ? x86_perf_event_set_period+0xca/0x170
[407484.513834] [<ffffffff8115ced9>] perf_event_overflow+0x19/0x20
[407484.520812] [<ffffffff8103266a>] intel_pmu_handle_irq+0x1ba/0x3a0
[407484.528119] [<ffffffff8102a04b>] perf_event_nmi_handler+0x2b/0x50
[407484.535402] [<ffffffff81018610>] nmi_handle+0xa0/0x150
[407484.541701] [<ffffffff81018575>] ? nmi_handle+0x5/0x150
[407484.548069] [<ffffffff810188ba>] default_do_nmi+0x4a/0x140
[407484.554692] [<ffffffff81018a48>] do_nmi+0x98/0xe0
[407484.560517] [<ffffffff816c0bf1>] end_repeat_nmi+0x1e/0x2e
[407484.567054] [<ffffffff81058bff>] ? perf_get_regs_user+0xbf/0x190
[407484.574256] [<ffffffff81058bff>] ? perf_get_regs_user+0xbf/0x190
[407484.581431] [<ffffffff81058bff>] ? perf_get_regs_user+0xbf/0x190
[407484.588602] <<EOE>> <IRQ> [<ffffffff8115c28c>] perf_prepare_sample+0x2ec/0x3c0
[407484.597358] [<ffffffff8115c46e>] __perf_event_overflow+0x10e/0x270
[407484.604708] [<ffffffff8115c439>] ? __perf_event_overflow+0xd9/0x270
[407484.612215] [<ffffffff8115c924>] ? perf_tp_event+0xc4/0x210
[407484.619000] [<ffffffff8115cfe2>] ? __perf_sw_event+0x72/0x1f0
[407484.625937] [<ffffffff8115c799>] ? perf_swevent_overflow+0xa9/0xe0
[407484.633287] [<ffffffff8115c799>] perf_swevent_overflow+0xa9/0xe0
[407484.640467] [<ffffffff8115c837>] perf_swevent_event+0x67/0x90
[407484.647343] [<ffffffff8115c924>] perf_tp_event+0xc4/0x210
[407484.653923] [<ffffffff810b6fa9>] ? lock_acquire+0x119/0x130
[407484.660606] [<ffffffff810b3cf6>] ? perf_trace_lock_acquire+0x146/0x180
[407484.668332] [<ffffffff810b594f>] ? __lock_acquire.isra.31+0x3af/0xfe0
[407484.675962] [<ffffffff810b3cf6>] perf_trace_lock_acquire+0x146/0x180
[407484.683490] [<ffffffff810b6fa9>] ? lock_acquire+0x119/0x130
[407484.690211] [<ffffffff810b6fa9>] lock_acquire+0x119/0x130
[407484.696750] [<ffffffff8115b7f5>] ? perf_event_wakeup+0x5/0xf0
[407484.703640] [<ffffffff811f50ef>] ? kill_fasync+0xf/0xf0
[407484.710008] [<ffffffff8115b828>] perf_event_wakeup+0x38/0xf0
[407484.716798] [<ffffffff8115b7f5>] ? perf_event_wakeup+0x5/0xf0
[407484.723696] [<ffffffff8115b913>] perf_pending_event+0x33/0x60
[407484.730570] [<ffffffff8114cc7c>] irq_work_run_list+0x4c/0x80
[407484.737392] [<ffffffff8114ccc8>] irq_work_run+0x18/0x40
[407484.743765] [<ffffffff8101955f>] smp_trace_irq_work_interrupt+0x3f/0xc0
[407484.751579] [<ffffffff816c01fd>] trace_irq_work_interrupt+0x6d/0x80
[407484.759046] <EOI> [<ffffffff810b6f4d>] ? lock_acquire+0xbd/0x130
[407484.766380] [<ffffffff811f4f42>] ? SyS_fcntl+0x5b2/0x650
[407484.772786] [<ffffffff816bdb31>] _raw_spin_lock+0x31/0x40
[407484.779321] [<ffffffff811f4f42>] ? SyS_fcntl+0x5b2/0x650
[407484.785813] [<ffffffff811f4f42>] SyS_fcntl+0x5b2/0x650
[407484.792109] [<ffffffff816be6ed>] system_call_fastpath+0x16/0x1b
[407484.799195] ---[ end trace 55752a03ec8ab979 ]---
[407490.307881] INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 4, t=5252 jiffies, g=36308737, c=36308736, q=207)
[407490.320908] Task dump for CPU 3:
[407490.325104] perf_fuzzer R running task 0 27209 2304 0x00000008
[407490.333439] ffffffff8115c837 ffff8800c4293cc8 ffffffff8115c924 ffff880000000024
[407490.342163] 0000000000000001 ffff8800c4293b80 ffff8800c4293d90 ffff8800ceb81000
[407490.350899] ffffe8ffffcca880 ffff8800c4293b10 ffffffff8115c799 ffff8800ceb81000
[407490.359619] Call Trace:
[407490.362932] [<ffffffff8115c837>] ? perf_swevent_event+0x67/0x90
[407490.370033] [<ffffffff8115c924>] ? perf_tp_event+0xc4/0x210
[407490.376770] [<ffffffff8115c799>] ? perf_swevent_overflow+0xa9/0xe0
[407490.384068] [<ffffffff8115c837>] ? perf_swevent_event+0x67/0x90
[407490.391073] [<ffffffff8115c924>] ? perf_tp_event+0xc4/0x210
[407490.397696] [<ffffffff810b6fa9>] ? lock_acquire+0x119/0x130
[407490.404328] [<ffffffff810b594f>] ? __lock_acquire.isra.31+0x3af/0xfe0
[407490.411921] [<ffffffff810b3cf6>] ? perf_trace_lock_acquire+0x146/0x180
[407490.419567] [<ffffffff810b594f>] ? __lock_acquire.isra.31+0x3af/0xfe0
[407490.427133] [<ffffffff810b6f4d>] ? lock_acquire+0xbd/0x130
[407490.433668] [<ffffffff811f4f42>] ? SyS_fcntl+0x5b2/0x650
[407490.440062] [<ffffffff816bdb31>] ? _raw_spin_lock+0x31/0x40
[407490.446600] [<ffffffff811f4f42>] ? SyS_fcntl+0x5b2/0x650
[407490.452913] [<ffffffff811f4f42>] ? SyS_fcntl+0x5b2/0x650
[407490.459253] [<ffffffff816be6ed>] ? system_call_fastpath+0x16/0x1b
(repeat forever filling up the disk with these messages)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/