Re: [perf] more perf_fuzzer memory corruption

From: Peter Zijlstra
Date: Fri May 02 2014 - 07:16:17 EST


On Thu, May 01, 2014 at 02:49:01PM -0400, Vince Weaver wrote:
>
> OK, humor me a bit here.
>
> I'm looking at the buggy trace and comparing against a "good" trace where
> the bug doesn't happen.
>
> It is a rance condition of sorts, because it's just a 10us or so
> interleaving of calls that causes the bug to happen or not.
>
> In the good trace:
>
> [parent] __perf_event_task_sched_out (and hence perf_swevent_del)
> [child] perf_release
>
> In the buggy trace:
>
> [child] perf_release
> [parent] __perf_event_task_sched_out (perf_swevent_del never happens)
>
>
> perf_swevent_del calls
> hlist_del_rcu(event->hlist_entry)
> to remove the event from the swevent hlist.
>
> Now in theory perf_release() calls sw_perf_event_destroy() which you
> would think would also call the above. Instead it does
> swevent_hlist_put_cpu(event, cpu);
> which does all kinds of weird hash stuff that I don't follow.
>
> Should the above two be equivelent? Is it reference counting in there
> with if (!--swhash->hlist_refcount) causing the issue?

perf_release()
put_event()
perf_remove_from_context()
__perf_remove_from_context()
event_sched_out()
->del()

is the path that would call ->del() and hlist_del_rcu().

Now perf_remove_from_context() only calls __perf_remove_from_context()
when the task is active somewhere, otherwise it simply calls
list_del_event().

Both perf_remove_from_context() and perf_event_context_sched_out() (as
called from __perf_event_task_sched_out) hold ctx->lock, so they should
be serialized against each other.

Clearly I'm missing something though, will go stare at the trace now.

Attachment: pgpO0ve1Cu7En.pgp
Description: PGP signature