Re: perfevents: irq loop stuck!

From: Vince Weaver
Date: Mon May 19 2014 - 09:07:27 EST


On Fri, 16 May 2014, Peter Zijlstra wrote:

> On Fri, May 16, 2014 at 12:25:28AM -0400, Vince Weaver wrote:
> > anyway I'm not sure if it's worth tracking this more if it's possible to
> > mostly fix the case by fixing the sample_period bounds.
>
> Right, so lets start with that, if it triggers again, we'll have another
> look.

I applied the patch and can verify it avoids the too-big-period-wrapping
problem.

I left things fuzzing over the weekend, and eventually the bug triggered
again. The problem issue still seems to be caused by
"sample_period=2,fixed counter 0"
so maybe there's an erratum out there I should be looking up.

The fuzzing also turned up a few other issues, and in the end after 2 days
it locked up the machine so hard that it also took out the ethernet switch
due to some sort of packet trasmit storm, which is a failure mode I
have to admit I haven't encountered before.

Vince

[69213.252805] ------------[ cut here ]------------
[69213.260637] WARNING: CPU: 4 PID: 11343 at
arch/x86/kernel/cpu/perf_event_intel.c:1373 intel_pmu_handle_irq+0x2a4/0x3c0()
[69213.276788] perfevents: irq loop stuck!
...
[69213.686561] CPU#4: ctrl: 0000000000000000
[69213.694352] CPU#4: status: 0000000000000000
[69213.701979] CPU#4: overflow: 0000000000000000
[69213.709599] CPU#4: fixed: 00000000000000b8
[69213.717172] CPU#4: pebs: 0000000000000000
[69213.724596] CPU#4: active: 0000000300000000
[69213.731939] CPU#4: gen-PMC0 ctrl: 000000000013412e
[69213.739877] CPU#4: gen-PMC0 count: 000000000000002c
[69213.747820] CPU#4: gen-PMC0 left: 0000ffffffffffd7
[69213.755657] CPU#4: gen-PMC1 ctrl: 0000000000138b40
[69213.763461] CPU#4: gen-PMC1 count: 00000000000086b3
[69213.771152] CPU#4: gen-PMC1 left: 0000ffffffff81c9
[69213.778742] CPU#4: gen-PMC2 ctrl: 000000000013024e
[69213.786271] CPU#4: gen-PMC2 count: 0000000000000001
[69213.793784] CPU#4: gen-PMC2 left: 0000ffffffffffff
[69213.801227] CPU#4: gen-PMC3 ctrl: 0000000000134f2e
[69213.808720] CPU#4: gen-PMC3 count: 00000000000009f9
[69213.816192] CPU#4: gen-PMC3 left: 0000fffffffff6de
[69213.823620] CPU#4: fixed-PMC0 count: 0000fffffffffffe
[69213.831035] CPU#4: fixed-PMC1 count: 0000ffffea2a90b2
[69213.838477] CPU#4: fixed-PMC2 count: 00000000051c5865
[69213.845792] perf_event_intel: clearing PMU state on CPU#4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/