Re: perf: use-after-free in perf_release

From: Oleg Nesterov
Date: Wed Mar 15 2017 - 12:46:12 EST


On 03/14, Peter Zijlstra wrote:
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10346,6 +10346,17 @@ void perf_event_free_task(struct task_struct *task)
> continue;
>
> mutex_lock(&ctx->mutex);
> + raw_spin_lock_irq(&ctx->lock);
> + /*
> + * Destroy the task <-> ctx relation and mark the context dead.
> + *
> + * This is important because even though the task hasn't been
> + * exposed yet the context has been (through child_list).
> + */
> + RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL);
> + WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
> + put_task_struct(task); /* cannot be last */
> + raw_spin_unlock_irq(&ctx->lock);

Agreed, this is what I had in mind. Although you know, I spent 3
hours looking at your patch and I still can't convince myself I am
really sure it closes all races ;)

OK, I believe this is correct. And iiuc both RCU_INIT_POINTER(NULL)
and put_task_struct() are not strictly necessary? At least until we
add WARN_ON(tsk->usage != 2) before free_task() in copy process().


---------------------------------------------------------------------
This is off-topic, but to me list_for_each_entry(event->child_list)
in perf_event_release_kernel() looks very confusing and misleading.
And list_first_entry_or_null(), we do not really need NULL if list
is empty, tmp == child should be F even if we use list_first_entry().
And given that we already have list_is_last(), it would be nice to
add list_is_first() and cleanup perf_event_release_kernel() a bit:

--- x/kernel/events/core.c
+++ x/kernel/events/core.c
@@ -4152,7 +4152,7 @@ static void put_event(struct perf_event
int perf_event_release_kernel(struct perf_event *event)
{
struct perf_event_context *ctx = event->ctx;
- struct perf_event *child, *tmp;
+ struct perf_event *child;

/*
* If we got here through err_file: fput(event_file); we will not have
@@ -4190,8 +4190,9 @@ int perf_event_release_kernel(struct per

again:
mutex_lock(&event->child_mutex);
- list_for_each_entry(child, &event->child_list, child_list) {
-
+ if (!list_empty(&event->child_list)) {
+ child = list_first_entry(&event->child_list,
+ struct perf_event, child_list);
/*
* Cannot change, child events are not migrated, see the
* comment with perf_event_ctx_lock_nested().
@@ -4221,9 +4222,7 @@ again:
* state, if child is still the first entry, it didn't get freed
* and we can continue doing so.
*/
- tmp = list_first_entry_or_null(&event->child_list,
- struct perf_event, child_list);
- if (tmp == child) {
+ if (list_is_first(child, &event->child_list)) {
perf_remove_from_context(child, DETACH_GROUP);
list_del(&child->child_list);
free_event(child);

But we can't, because

static inline int list_is_first(const struct list_head *list,
const struct list_head *head)
{
return list->prev == head;
}

won't work, "child" can be freed so we can't dereference it, and

static inline int list_is_first(const struct list_head *list,
const struct list_head *head)
{
return head->next == list;
}

won't be symmetrical with list_is_last() we already have.

Oleg.