[tip:perf/urgent] perf: Fix cloning

From: tip-bot for Peter Zijlstra
Date: Thu Feb 25 2016 - 03:06:09 EST


Commit-ID: a69b0ca4ac3bf5427b571f11cbf33f0a32b728d5
Gitweb: http://git.kernel.org/tip/a69b0ca4ac3bf5427b571f11cbf33f0a32b728d5
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
AuthorDate: Wed, 24 Feb 2016 18:45:44 +0100
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Thu, 25 Feb 2016 08:42:33 +0100

perf: Fix cloning

Alexander reported that when the 'original' context gets destroyed, no
new clones happen.

This can happen irrespective of the ctx switch optimization, any task
can die, even the parent, and we want to continue monitoring the task
hierarchy until we either close the event or no tasks are left in the
hierarchy.

perf_event_init_context() will attempt to pin the 'parent' context
during clone(). At that point current is the parent, and since current
cannot have exited while executing clone(), its context cannot have
passed through perf_event_exit_task_context(). Therefore
perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE.

However, since inherit_event() does:

if (parent_event->parent)
parent_event = parent_event->parent;

it looks at the 'original' event when it does: is_orphaned_event().
This can return true if the context that contains the this event has
passed through perf_event_exit_task_context(). And thus we'll fail to
clone the perf context.

Fix this by adding a new state: STATE_DEAD, which is set by
perf_release() to indicate that the filedesc (or kernel reference) is
dead and there are no observers for our data left.

Only for STATE_DEAD will is_orphaned_event() be true and inhibit
cloning.

STATE_EXIT is otherwise preserved such that is_event_hup() remains
functional and will report when the observed task hierarchy becomes
empty.

Reported-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Tested-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Reviewed-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: dvyukov@xxxxxxxxxx
Cc: eranian@xxxxxxxxxx
Cc: oleg@xxxxxxxxxx
Cc: panand@xxxxxxxxxx
Cc: sasha.levin@xxxxxxxxxx
Cc: vince@xxxxxxxxxx
Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20160224174947.919845295@xxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 29 ++++++++++++++---------------
2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b35a61a..3915661 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -397,6 +397,7 @@ struct pmu {
* enum perf_event_active_state - the states of a event
*/
enum perf_event_active_state {
+ PERF_EVENT_STATE_DEAD = -4,
PERF_EVENT_STATE_EXIT = -3,
PERF_EVENT_STATE_ERROR = -2,
PERF_EVENT_STATE_OFF = -1,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 64698fb..92d6999 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1645,7 +1645,7 @@ out:

static bool is_orphaned_event(struct perf_event *event)
{
- return event->state == PERF_EVENT_STATE_EXIT;
+ return event->state == PERF_EVENT_STATE_DEAD;
}

static inline int pmu_filter_match(struct perf_event *event)
@@ -1732,7 +1732,6 @@ group_sched_out(struct perf_event *group_event,
}

#define DETACH_GROUP 0x01UL
-#define DETACH_STATE 0x02UL

/*
* Cross CPU call to remove a performance event
@@ -1752,8 +1751,6 @@ __perf_remove_from_context(struct perf_event *event,
if (flags & DETACH_GROUP)
perf_group_detach(event);
list_del_event(event, ctx);
- if (flags & DETACH_STATE)
- event->state = PERF_EVENT_STATE_EXIT;

if (!ctx->nr_events && ctx->is_active) {
ctx->is_active = 0;
@@ -3772,22 +3769,24 @@ int perf_event_release_kernel(struct perf_event *event)

ctx = perf_event_ctx_lock(event);
WARN_ON_ONCE(ctx->parent_ctx);
- perf_remove_from_context(event, DETACH_GROUP | DETACH_STATE);
- perf_event_ctx_unlock(event, ctx);
+ perf_remove_from_context(event, DETACH_GROUP);

+ raw_spin_lock_irq(&ctx->lock);
/*
- * At this point we must have event->state == PERF_EVENT_STATE_EXIT,
- * either from the above perf_remove_from_context() or through
- * perf_event_exit_event().
+ * Mark this even as STATE_DEAD, there is no external reference to it
+ * anymore.
*
- * Therefore, anybody acquiring event->child_mutex after the below
- * loop _must_ also see this, most importantly inherit_event() which
- * will avoid placing more children on the list.
+ * Anybody acquiring event->child_mutex after the below loop _must_
+ * also see this, most importantly inherit_event() which will avoid
+ * placing more children on the list.
*
* Thus this guarantees that we will in fact observe and kill _ALL_
* child events.
*/
- WARN_ON_ONCE(event->state != PERF_EVENT_STATE_EXIT);
+ event->state = PERF_EVENT_STATE_DEAD;
+ raw_spin_unlock_irq(&ctx->lock);
+
+ perf_event_ctx_unlock(event, ctx);

again:
mutex_lock(&event->child_mutex);
@@ -4000,7 +3999,7 @@ static bool is_event_hup(struct perf_event *event)
{
bool no_children;

- if (event->state != PERF_EVENT_STATE_EXIT)
+ if (event->state > PERF_EVENT_STATE_EXIT)
return false;

mutex_lock(&event->child_mutex);
@@ -8727,7 +8726,7 @@ perf_event_exit_event(struct perf_event *child_event,
if (parent_event)
perf_group_detach(child_event);
list_del_event(child_event, child_ctx);
- child_event->state = PERF_EVENT_STATE_EXIT; /* see perf_event_release_kernel() */
+ child_event->state = PERF_EVENT_STATE_EXIT; /* is_event_hup() */
raw_spin_unlock_irq(&child_ctx->lock);

/*