Re: [perf] more perf_fuzzer memory corruption

From: Peter Zijlstra
Date: Tue Apr 29 2014 - 15:01:17 EST


On Tue, Apr 29, 2014 at 02:21:56PM -0400, Vince Weaver wrote:
> On Tue, 29 Apr 2014, Peter Zijlstra wrote:
>
> > > Event #16 is a SW event created and running in the parent on CPU0.
> >
> > A regular software one, right? Not a timer one.
>
> Maybe. From traces I have it looks like it's a regular one (i.e. calls
> perf_swevent_add() ) but who knows at this point.
>
> When I actually got a trace with perf_event_open() instrumented to print
> some attr values it looked like things were being caused by
> PERF_COUNT_SW_TASK_CLOCK which makes no sense.
>
> > > CPU6 (child) shutting down.
> > > last user of event #16
> > > perf_release() called on event
> > > which eventually calls event_sched_out()
> > > which calls pmu->del which removes event from swevent_htable
> > > *but only on CPU6*
> >
> > So on fork() we'll also clone the counter; after which there's two. One
> > will run on each task.
>
> even if inherit isn't set?

Fair point, nope not in that case. If you can trigger this without ever
using .inherit=1 this would exclude a lot of funny code.

> > Because of a context switch optimization they can actually flip around
> > (the below patch disables that).
>
> ENOPATCH?

urgh.. fail.


diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5129b1201050..0d6a58950a3b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2293,6 +2291,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
if (!cpuctx->task_ctx)
return;

+#if 0
rcu_read_lock();
next_ctx = next->perf_event_ctxp[ctxn];
if (!next_ctx)
@@ -2335,6 +2334,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
}
unlock:
rcu_read_unlock();
+#endif

if (do_switch) {
raw_spin_lock(&ctx->lock);

> > quite the puzzle this one
>
> yes.
>
> I'm tediously working on trying to get a good trace of this happening.
>
> I have a random seed that will trigger the bug in the fuzzer around 1 time
> in 10.
>
> Unfortunately many of the times it crashes so hard/quickly there's no
> chance of getting the trace data (dump trace on oops never holds enough
> state, and often the fuzzing triggers its own random trace events that
> clutter those logs).
>
> Also trace-cmd is a pain to use. Any suggested events I should trace
> beyond the obvious?

I've never used trace-cmd :/ What I do in the crashing hard case is try
and make dump_ftrace_on_oops work, although capturing a full trace
buffer over serial is exceedingly painful -- maxcpus= might work if you
have too many CPUs, I forgot.

Anyway, I can make the fuzzer to weird shit, but it doesn't look like
the thing you're seeing, but who knows.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/