Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

From: Vince Weaver
Date: Fri Jul 03 2015 - 14:57:07 EST


On Fri, 3 Jul 2015, Peter Zijlstra wrote:

> On Thu, Jul 02, 2015 at 11:18:10AM -0400, Vince Weaver wrote:
> >
> > So sad to say the lack of fuzzer reports was because I was out of town for
> > a bit, not due to the kernel suddenly getting amazingly better.
> >
> > In any case I am running against current git and getting a lot of
> > warnings, but most of them seem to be old ones. This following one looks
> > new though.
> >
> > This is current linus-git on a Haswell machine with peterz's patch to fix
> > the aux buffer spinlock recursion (I can still crash the kernel if that
> > patch is not applied).
> >
> > It corresponds to:
> >
> > WARN_ON_ONCE(!event->attr.precise_ip);
> >
> > [ 584.352324] WARNING: CPU: 2 PID: 18924 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
>
> I've not yet tried to reproduce, but the below could explain things.
>
> On disabling an event we first clear our cpuc->pebs_enabled bits, only
> to then check them to see if there are any set, and if so, drain the
> buffer.
>
> If we just cleared the last bit, we'll fail to drain the buffer.
>
> If we then program another event on that counter and another PEBS event,
> we can hit the above WARN with the 'stale' entries left over from the
> previous event.

with that patch applied I still managed to hit this:

WARN_ON_ONCE(!event->attr.precise_ip);

I'll let it run some more and see if the watchdog still gets triggered.

[ 2217.544901] ------------[ cut here ]------------
[ 2217.550351] WARNING: CPU: 2 PID: 9136 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
[ 2217.563534] Modules linked in: fuse snd_hda_codec_hdmi i915 x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse hmac drbg evdev serio_raw ansi_cprng snd_hda_codec_realtek drm_kms_helper snd_hda_codec_generic ppdev iTCO_wdt iTCO_vendor_support pcspkr drm i2c_algo_bit aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper snd_hda_intel cryptd mei_me mei snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer tpm_tis tpm wmi button processor video battery i2c_i801 parport_pc parport snd lpc_ich mfd_core soundcore sg sr_mod sd_mod cdrom ehci_pci ehci_hcd ahci libahci xhci_pci xhci_hcd e1000e libata ptp crc32c_intel scsi_mod pps_core usbcore usb_common fan thermal thermal_sys
[ 2217.640998] CPU: 2 PID: 9136 Comm: perf_fuzzer Tainted: G W 4.1.0+ #163
[ 2217.649810] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 2217.658281] ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000
[ 2217.666818] 0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0
[ 2217.675329] 0000000000000002 0000000000000001 ffff88011ea8bd80 ffff8801190400c0
[ 2217.683821] Call Trace:
[ 2217.686960] <NMI> [<ffffffff8169f823>] dump_stack+0x45/0x57
[ 2217.693638] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
[ 2217.700549] [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
[ 2217.707296] [<ffffffff8102f783>] intel_pmu_drain_pebs_nhm+0x283/0x2e0
[ 2217.714775] [<ffffffff81031834>] ? intel_pmu_disable_event+0xa4/0x130
[ 2217.722216] [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
[ 2217.729339] [<ffffffff8115413e>] ? perf_event_ctx_lock_nested+0x5e/0xf0
[ 2217.737026] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[ 2217.744070] [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[ 2217.750160] [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[ 2217.756290] [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
[ 2217.762688] [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[ 2217.768280] [<ffffffff816a979f>] end_repeat_nmi+0x1e/0x2e
[ 2217.774627] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 2217.781894] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 2217.789153] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 2217.796415] <<EOE>> <IRQ> [<ffffffff81030a30>] intel_pmu_enable_all+0x10/0x20
[ 2217.804847] [<ffffffff8102a95c>] x86_pmu_enable+0x25c/0x2e0
[ 2217.811383] [<ffffffff811560e2>] perf_pmu_enable+0x22/0x30
[ 2217.817837] [<ffffffff81157a80>] perf_mux_hrtimer_handler+0x120/0x1f0
[ 2217.825316] [<ffffffff81157960>] ? perf_event_context_sched_in+0x150/0x150
[ 2217.833239] [<ffffffff810dcf43>] __hrtimer_run_queues+0xd3/0x260
[ 2217.840239] [<ffffffff810dd4bb>] hrtimer_interrupt+0xab/0x1b0
[ 2217.846930] [<ffffffff8104b32c>] local_apic_timer_interrupt+0x3c/0x70
[ 2217.854367] [<ffffffff816aa1a1>] smp_apic_timer_interrupt+0x41/0x60
[ 2217.861630] [<ffffffff816a83eb>] apic_timer_interrupt+0x6b/0x70
[ 2217.868540] <EOI>
[ 2217.870633] ---[ end trace 3a31b4d07b4f3450 ]---
[ 2353.824071] Uhhuh. NMI received for unknown reason 31 on CPU 1.
[ 2353.831238] Do you have a strange power saving mode enabled?
[ 2353.838120] Dazed and confused, but trying to continue

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/