[RFC][PATCH 02/12] perf: Fix cgroup event scheduling

From: Peter Zijlstra
Date: Mon Jan 11 2016 - 11:36:22 EST


There appears to be a problemin __perf_event_task_sched_in() wrt
cgroup event scheduling.

The normal event scheduling order is:

CPU pinned
Task pinned
CPU flexible
Task flexible

And since perf_cgroup_sched*() only schedules the cpu context, we must
call this _before_ adding the task events.

Note: double check what happens on the ctx switch optimization where
the task ctx isn't scheduled.

Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/events/core.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2804,6 +2804,16 @@ void __perf_event_task_sched_in(struct t
struct perf_event_context *ctx;
int ctxn;

+ /*
+ * If cgroup events exist on this CPU, then we need to check if we have
+ * to switch in PMU state; cgroup event are system-wide mode only.
+ *
+ * Since cgroup events are CPU events, we must schedule these in before
+ * we schedule in the task events.
+ */
+ if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
+ perf_cgroup_sched_in(prev, task);
+
for_each_task_context_nr(ctxn) {
ctx = task->perf_event_ctxp[ctxn];
if (likely(!ctx))
@@ -2811,13 +2821,6 @@ void __perf_event_task_sched_in(struct t

perf_event_context_sched_in(ctx, task);
}
- /*
- * if cgroup events exist on this CPU, then we need
- * to check if we have to switch in PMU state.
- * cgroup event are system-wide mode only
- */
- if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
- perf_cgroup_sched_in(prev, task);

if (atomic_read(&nr_switch_events))
perf_event_switch(task, prev, true);