[PATCH 2/9] perf: Deny optimized switch for events read by PERF_SAMPLE_READ

From: Jiri Olsa
Date: Mon Aug 25 2014 - 10:49:31 EST


The optimized task context switch for cloned perf events just
swaps whole perf event contexts (of current and next process)
if it finds them suitable. Events from the 'current' context
will now measure data of the 'next' context and vice versa.

This is ok for cases where we are not directly interested in
the event->count value of separate child events, like:
- standard sampling, where we take 'period' value for the
event count
- counting, where we accumulate all events (children)
into a single count value

But in case we read event by using the PERF_SAMPLE_READ sample
type, we are interested in direct event->count value measured
in specific task. Switching events within tasks for this kind
of measurements corrupts data.

Fixing this by setting/unsetting pin_count for perf event context
once cloned event with PERF_SAMPLE_READ read is added/removed.
The pin_count value != 0 makes the context not suitable for
optimized switch.

Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Corey Ashford <cjashfor@xxxxxxxxxxxxxxxxxx>
Cc: David Ahern <dsahern@xxxxxxxxx>
Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Jen-Cheng(Tommy) Huang <tommy24@xxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Paul Mackerras <paulus@xxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
---
kernel/events/core.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4ad4ba2bc106..ff6a17607ddb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1117,6 +1117,12 @@ ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
return &ctx->flexible_groups;
}

+static bool is_clone_with_read(struct perf_event *event)
+{
+ return event->parent &&
+ (event->attr.sample_type & PERF_SAMPLE_READ);
+}
+
/*
* Add a event from the lists for its context.
* Must be called with ctx->mutex and ctx->lock held.
@@ -1148,6 +1154,9 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
if (has_branch_stack(event))
ctx->nr_branch_stack++;

+ if (is_clone_with_read(event))
+ ctx->pin_count++;
+
list_add_rcu(&event->event_entry, &ctx->event_list);
if (!ctx->nr_events)
perf_pmu_rotate_start(ctx->pmu);
@@ -1313,6 +1322,9 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
if (has_branch_stack(event))
ctx->nr_branch_stack--;

+ if (is_clone_with_read(event))
+ ctx->pin_count--;
+
ctx->nr_events--;
if (event->attr.inherit_stat)
ctx->nr_stat--;
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/