Re: perf/ftrace lockup on 3.12-rc6 with trigger code
From: Vince Weaver
Date: Fri Oct 25 2013 - 09:11:16 EST
On Fri, 25 Oct 2013, Steven Rostedt wrote:
> On Thu, 2013-10-24 at 14:25 -0400, Vince Weaver wrote:
> > On Thu, 24 Oct 2013, Vince Weaver wrote:
> > > after a month of trying I finally got a small test-case out of my
> > > perf_fuzzer suite that triggers a system lockup with just one syscall.
> > >
> > > Attached is the code that triggers it.
> >
> > And it turns out you can only trigger this specific problem if advanced
> > ftrace options are enabled.
> >
> > CONFIG_KPROBES_ON_FTRACE=y
> > CONFIG_FUNCTION_TRACER=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > CONFIG_STACK_TRACER=y
> > CONFIG_DYNAMIC_FTRACE=y
> > CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> > CONFIG_FUNCTION_PROFILER=y
> > CONFIG_FTRACE_MCOUNT_RECORD=y
>
> The above STACK_TRACER, FTRACE_WITH_REGS and FUNCTION_PROFILER probably
> don't need to be set, as they are pretty much stand alone, and don't
> look to be involved in the stack traces that you (and Dave) posted.
>
> >
> > Urgh, I had turned those on to try to debug something and forgot to
> > disable. I feel like I saw this problem before I had those enabled so I
> > guess I have to start from scratch fuzzing to see if I can get a more
> > generally reproducible trace.
>
> Looks like something is incorrectly enabling function tracer within
> perf. Peter told me that there's some ref count bug that may use data
> after being freed on exit.
>
> I tried the program that you attached in you previous email, and was not
> able to hit the bug. Are you able to hit the bug with that code each
> time?
yes. My poor core2 machine has been hard-reset (hold down the power
button it's locked that hard) about 200 times in the past month while
trying to track down this problem.
I'm not sure how tracepoints work exactly, but the problem code is setting
pe[5].type=PERF_TYPE_TRACEPOINT;
pe[5].config=0x7fffffff00000001;
The config is being truncated to 32-bits by the perf/ftrace code so I
think this means the tracepoint being enabled is
tracing/events/ftrace/function/id:1
The sample period is set to
pe[5].sample_period=0xffffffffff000000;
and the fd is set to generate a signal on overflow (the crash doesn't
happen unless a signal handler is set up).
If I must I can problem start sprinkling printks around the code to try to
track things down in more detail but I'd rather not if I can avoid that.
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/