Re: perf: another perf_fuzzer generated lockup
From: Stephane Eranian
Date: Fri May 08 2015 - 20:04:41 EST
Vince,
On Thu, May 7, 2015 at 9:40 PM, Vince Weaver <vincent.weaver@xxxxxxxxx> wrote:
>
>
> This is a new one I think, I hit it on the haswell machine running
> 4.1-rc2.
>
> The backtrace is complex enough I'm not really sure what's going on here.
>
> The fuzzer has been having weird issues where it's been getting
> overflow signals from invalid fds. This seems to happen
> when an overflow signal interrupts the fuzzer mid-fork?
> And the fuzzer code doesn't handle this well and attempts to call exit()
> and/or kill the child from the signal handler that interrupted the
> fork() and that doesn't always go well. I'm not sure if this is related,
> just that some of those actions seem to appear in the backtrace.
>
>
Is there a way to figure out how the fuzzer had programmed the PMU
to get there? (besides adding PMU state dump in the kernel crashdump)?
>
> [33864.529861] ------------[ cut here ]------------
> [33864.534824] WARNING: CPU: 1 PID: 9852 at kernel/watchdog.c:302 watchdog_overflow_callback+0x92/0xc0()
> [33864.544682] Watchdog detected hard LOCKUP on cpu 1
> [33864.549635] Modules linked in:
> [33864.552943] fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul ghash_clmulni_intel snd_pcm aesni_intel aes_x86_64 lrw gf128mul evdev i915 iTCO_wdt iTCO_vendor_support glue_helper snd_timer ppdev ablk_helper psmouse drm_kms_helper cryptd snd drm pcspkr serio_raw lpc_ich soundcore parport_pc xhci_pci battery video processor i2c_i801 mei_me mei wmi i2c_algo_bit tpm_tis mfd_core xhci_hcd tpm parport button sg sr_mod sd_mod cdrom ehci_pci ahci ehci_hcd libahci libata e1000e ptp usbcore crc32c_intel scsi_mod fan usb_common pps_core thermal thermal_sys
> [33864.622413] CPU: 1 PID: 9852 Comm: perf_fuzzer Not tainted 4.1.0-rc2+ #142
> [33864.629776] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [33864.637685] ffffffff81a209b5 ffff88011ea45aa0 ffffffff816d51d3 0000000000000000
> [33864.645709] ffff88011ea45af0 ffff88011ea45ae0 ffffffff81072dfa ffff88011ea45ac0
> [33864.653731] ffff880119b8f800 0000000000000000 ffff88011ea45c40 ffff88011ea45ef8
> [33864.661783] Call Trace:
> [33864.664409] <NMI> [<ffffffff816d51d3>] dump_stack+0x45/0x57
> [33864.670618] [<ffffffff81072dfa>] warn_slowpath_common+0x8a/0xc0
> [33864.677071] [<ffffffff81072e76>] warn_slowpath_fmt+0x46/0x50
> [33864.683202] [<ffffffff8104db64>] ? native_apic_wait_icr_idle+0x24/0x30
> [33864.690280] [<ffffffff81124f22>] watchdog_overflow_callback+0x92/0xc0
> [33864.697294] [<ffffffff811673b1>] __perf_event_overflow+0x91/0x270
> [33864.703916] [<ffffffff811673f9>] ? __perf_event_overflow+0xd9/0x270
> [33864.710696] [<ffffffff8102b60a>] ? x86_perf_event_set_period+0xda/0x180
> [33864.717842] [<ffffffff81167f89>] perf_event_overflow+0x19/0x20
> [33864.724195] [<ffffffff810338f2>] intel_pmu_handle_irq+0x1e2/0x450
> [33864.730840] [<ffffffff8102a41b>] perf_event_nmi_handler+0x2b/0x50
> [33864.737436] [<ffffffff810185d0>] nmi_handle+0xa0/0x150
> [33864.743025] [<ffffffff81018535>] ? nmi_handle+0x5/0x150
> [33864.748733] [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
> [33864.754705] [<ffffffff81018a08>] do_nmi+0x98/0xe0
> [33864.759858] [<ffffffff816df42f>] end_repeat_nmi+0x1e/0x2e
> [33864.765746] [<ffffffff810bc1bb>] ? check_chain_key+0xdb/0x1e0
> [33864.772004] [<ffffffff810bc1bb>] ? check_chain_key+0xdb/0x1e0
> [33864.778253] [<ffffffff810bc1bb>] ? check_chain_key+0xdb/0x1e0
> [33864.784498] <<EOE>> <IRQ> [<ffffffff810bdd09>] __lock_acquire.isra.31+0x3b9/0x1000
> [33864.792950] [<ffffffff810bdd09>] ? __lock_acquire.isra.31+0x3b9/0x1000
> [33864.800045] [<ffffffff810bf2e5>] lock_acquire+0xa5/0x130
> [33864.805817] [<ffffffff81080afe>] ? __lock_task_sighand+0x6e/0x110
> [33864.812468] [<ffffffff81080aaa>] ? __lock_task_sighand+0x1a/0x110
> [33864.819084] [<ffffffff816dc721>] _raw_spin_lock+0x31/0x40
> [33864.824979] [<ffffffff81080afe>] ? __lock_task_sighand+0x6e/0x110
> [33864.831623] [<ffffffff81080afe>] __lock_task_sighand+0x6e/0x110
> [33864.838096] [<ffffffff81080aaa>] ? __lock_task_sighand+0x1a/0x110
> [33864.845314] [<ffffffff81080bcc>] do_send_sig_info+0x2c/0x80
> [33864.851949] [<ffffffff81167807>] ? perf_swevent_event+0x67/0x90
> [33864.858980] [<ffffffff8120092f>] send_sigio_to_task+0x12f/0x1a0
> [33864.866005] [<ffffffff81200805>] ? send_sigio_to_task+0x5/0x1a0
> [33864.873047] [<ffffffff812010c6>] ? send_sigio+0x56/0x100
> [33864.879411] [<ffffffff8120111e>] send_sigio+0xae/0x100
> [33864.885564] [<ffffffff81201207>] kill_fasync+0x97/0xf0
> [33864.891713] [<ffffffff8120117f>] ? kill_fasync+0xf/0xf0
> [33864.897983] [<ffffffff811667d4>] perf_event_wakeup+0xd4/0xf0
> [33864.904662] [<ffffffff81166705>] ? perf_event_wakeup+0x5/0xf0
> [33864.911490] [<ffffffff811668d0>] ? perf_pending_event+0xe0/0x110
> [33864.918580] [<ffffffff811668d0>] perf_pending_event+0xe0/0x110
> [33864.925494] [<ffffffff81156eac>] irq_work_run_list+0x4c/0x80
> [33864.932197] [<ffffffff81156ef8>] irq_work_run+0x18/0x40
> [33864.938469] [<ffffffff8101953f>] smp_trace_irq_work_interrupt+0x3f/0xc0
> [33864.946263] [<ffffffff816dea3e>] trace_irq_work_interrupt+0x6e/0x80
> [33864.953646] <EOI> [<ffffffff811a1b47>] ? copy_page_range+0x527/0x9a0
> [33864.961287] [<ffffffff811a1b22>] ? copy_page_range+0x502/0x9a0
> [33864.968265] [<ffffffff810710d2>] copy_process.part.23+0xc92/0x1b80
> [33864.975589] [<ffffffff810813be>] ? SYSC_kill+0x8e/0x230
> [33864.981879] [<ffffffff81072178>] do_fork+0xd8/0x420
> [33864.987807] [<ffffffff812007e3>] ? f_setown+0x83/0xa0
> [33864.993953] [<ffffffff81200d30>] ? SyS_fcntl+0x310/0x650
> [33865.000348] [<ffffffff81417a23>] ? lockdep_sys_exit_thunk+0x12/0x14
> [33865.007781] [<ffffffff81072546>] SyS_clone+0x16/0x20
> [33865.013830] [<ffffffff816dd1b2>] system_call_fastpath+0x16/0x7a
> [33865.020843] ---[ end trace d3bd7d73656f3cba ]---
> [33865.026418] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 496.487 msecs
> [33865.035874] perf interrupt took too long (3879951 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/