[PATCH] perf: Fix race in perf_event_exit_task_context

From: Peter Zijlstra
Date: Mon Jan 25 2016 - 08:10:10 EST



Subject: perf: Fix race in perf_event_exit_task_context
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Mon Jan 25 13:03:18 CET 2016

There is a race between perf_event_exit_task_context() and
orphans_remove_work() which results in a use-after-free.

We mark ctx->task with TASK_TOMBSTONE to indicate a context is 'dead',
under ctx->lock. After which point event_function_call() on any event
of that context will NOP

A concurrent orphans_remove_work() will only hold ctx->mutex for the
list iteration and not serialize against this. Therefore its possible
that orphans_remove_work()'s perf_remove_from_context() call will
fail, but we'll continue to free the event, with the result of free'd
memory still being on lists and everything.

Once perf_event_exit_task_context() gets around to acquiring
ctx->mutex it too will iterate the event list, encounter the already
free'd event and proceed to free it _again_. This fails with the WARN
in free_event().

Plug the race by having perf_event_exit_task_context() hold ctx::mutex
over the whole tear-down, thereby 'naturally' serializing against all
other sites, including the orphan work.

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/events/core.c | 50 +++++++++++++++++++++++++++++---------------------
1 file changed, 29 insertions(+), 21 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8748,14 +8748,40 @@ static void perf_event_exit_task_context
{
struct perf_event_context *child_ctx, *clone_ctx = NULL;
struct perf_event *child_event, *next;
- unsigned long flags;

WARN_ON_ONCE(child != current);

- child_ctx = perf_lock_task_context(child, ctxn, &flags);
+ child_ctx = perf_pin_task_context(child, ctxn);
if (!child_ctx)
return;

+ /*
+ * In order to reduce the amount of tricky in ctx tear-down, we hold
+ * ctx::mutex over the entire thing. This serializes against almost
+ * everything that wants to access the ctx.
+ *
+ * The exception is sys_perf_event_open() /
+ * perf_event_create_kernel_count() which does find_get_context()
+ * without ctx::mutex (it cannot because of the move_group double mutex
+ * lock thing). See the comments in perf_install_in_context().
+ *
+ * We can recurse on the same lock type through:
+ *
+ * __perf_event_exit_task()
+ * sync_child_event()
+ * put_event()
+ * mutex_lock(&ctx->mutex)
+ *
+ * But since its the parent context it won't be the same instance.
+ */
+ mutex_lock(&child_ctx->mutex);
+
+ /*
+ * In a single ctx::lock section, de-schedule the events and detach the
+ * context from the task such that we cannot ever get it scheduled back
+ * in.
+ */
+ raw_spin_lock_irq(&child_ctx->lock);
task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx);

/*
@@ -8767,14 +8793,8 @@ static void perf_event_exit_task_context
WRITE_ONCE(child_ctx->task, TASK_TOMBSTONE);
put_task_struct(current); /* cannot be last */

- /*
- * If this context is a clone; unclone it so it can't get
- * swapped to another process while we're removing all
- * the events from it.
- */
clone_ctx = unclone_ctx(child_ctx);
- update_context_time(child_ctx);
- raw_spin_unlock_irqrestore(&child_ctx->lock, flags);
+ raw_spin_unlock_irq(&child_ctx->lock);

if (clone_ctx)
put_ctx(clone_ctx);
@@ -8786,18 +8806,6 @@ static void perf_event_exit_task_context
*/
perf_event_task(child, child_ctx, 0);

- /*
- * We can recurse on the same lock type through:
- *
- * __perf_event_exit_task()
- * sync_child_event()
- * put_event()
- * mutex_lock(&ctx->mutex)
- *
- * But since its the parent context it won't be the same instance.
- */
- mutex_lock(&child_ctx->mutex);
-
list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event_entry)
__perf_event_exit_task(child_event, child_ctx, child);