[tip:perf/core] perf/core: Fix a race between mmap_close() and set_output() of AUX events

From: tip-bot for Alexander Shishkin
Date: Sat Sep 10 2016 - 08:46:33 EST


Commit-ID: 767ae08678c2c796bcd7f582ee457aee20a28a1e
Gitweb: http://git.kernel.org/tip/767ae08678c2c796bcd7f582ee457aee20a28a1e
Author: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
AuthorDate: Tue, 6 Sep 2016 16:23:49 +0300
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Sat, 10 Sep 2016 11:15:36 +0200

perf/core: Fix a race between mmap_close() and set_output() of AUX events

In the mmap_close() path we need to stop all the AUX events that are
writing data to the AUX area that we are unmapping, before we can
safely free the pages. To determine if an event needs to be stopped,
we're comparing its ->rb against the one that's getting unmapped.
However, a SET_OUTPUT ioctl may turn up inside an AUX transaction
and swizzle event::rb to some other ring buffer, but the transaction
will keep writing data to the old ring buffer until the event gets
scheduled out. At this point, mmap_close() will skip over such an
event and will proceed to free the AUX area, while it's still being
used by this event, which will set off a warning in the mmap_close()
path and cause a memory corruption.

To avoid this, always stop an AUX event before its ->rb is updated;
this will release the (potentially) last reference on the AUX area
of the buffer. If the event gets restarted, its new ring buffer will
be used. If another SET_OUTPUT comes and switches it back to the
old ring buffer that's getting unmapped, it's also fine: this
ring buffer's aux_mmap_count will be zero and AUX transactions won't
start any more.

Reported-by: Vince Weaver <vincent.weaver@xxxxxxxxx>
Signed-off-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: vince@xxxxxxxxxx
Link: http://lkml.kernel.org/r/20160906132353.19887-2-alexander.shishkin@xxxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/events/core.c | 31 +++++++++++++++++++++++++------
1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 07ac859..a54f2c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2496,11 +2496,11 @@ static int __perf_event_stop(void *info)
return 0;
}

-static int perf_event_restart(struct perf_event *event)
+static int perf_event_stop(struct perf_event *event, int restart)
{
struct stop_event_data sd = {
.event = event,
- .restart = 1,
+ .restart = restart,
};
int ret = 0;

@@ -4845,6 +4845,19 @@ static void ring_buffer_attach(struct perf_event *event,
spin_unlock_irqrestore(&rb->event_lock, flags);
}

+ /*
+ * Avoid racing with perf_mmap_close(AUX): stop the event
+ * before swizzling the event::rb pointer; if it's getting
+ * unmapped, its aux_mmap_count will be 0 and it won't
+ * restart. See the comment in __perf_pmu_output_stop().
+ *
+ * Data will inevitably be lost when set_output is done in
+ * mid-air, but then again, whoever does it like this is
+ * not in for the data anyway.
+ */
+ if (has_aux(event))
+ perf_event_stop(event, 0);
+
rcu_assign_pointer(event->rb, rb);

if (old_rb) {
@@ -6120,7 +6133,7 @@ static void perf_event_addr_filters_exec(struct perf_event *event, void *data)
raw_spin_unlock_irqrestore(&ifh->lock, flags);

if (restart)
- perf_event_restart(event);
+ perf_event_stop(event, 1);
}

void perf_event_exec(void)
@@ -6164,7 +6177,13 @@ static void __perf_event_output_stop(struct perf_event *event, void *data)

/*
* In case of inheritance, it will be the parent that links to the
- * ring-buffer, but it will be the child that's actually using it:
+ * ring-buffer, but it will be the child that's actually using it.
+ *
+ * We are using event::rb to determine if the event should be stopped,
+ * however this may race with ring_buffer_attach() (through set_output),
+ * which will make us skip the event that actually needs to be stopped.
+ * So ring_buffer_attach() has to stop an aux event before re-assigning
+ * its rb pointer.
*/
if (rcu_dereference(parent->rb) == rb)
ro->err = __perf_event_stop(&sd);
@@ -6678,7 +6697,7 @@ static void __perf_addr_filters_adjust(struct perf_event *event, void *data)
raw_spin_unlock_irqrestore(&ifh->lock, flags);

if (restart)
- perf_event_restart(event);
+ perf_event_stop(event, 1);
}

/*
@@ -7867,7 +7886,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event)
mmput(mm);

restart:
- perf_event_restart(event);
+ perf_event_stop(event, 1);
}

/*