Re: [RFC][PATCH 1/3] perf: Tighten (and fix) the grouping condition

From: Mark Rutland
Date: Fri Jan 23 2015 - 10:23:28 EST


On Fri, Jan 23, 2015 at 03:07:16PM +0000, Peter Zijlstra wrote:
> On Fri, Jan 23, 2015 at 03:02:12PM +0000, Mark Rutland wrote:
> > On Fri, Jan 23, 2015 at 12:52:00PM +0000, Peter Zijlstra wrote:
> > > The fix from 9fc81d87420d ("perf: Fix events installation during
> > > moving group") was incomplete in that it failed to recognise that
> > > creating a group with events for different CPUs is semantically
> > > broken -- they cannot be co-scheduled.
> > >
> > > Furthermore, it leads to real breakage where, when we create an event
> > > for CPU Y and then migrate it to form a group on CPU X, the code gets
> > > confused where the counter is programmed -- triggered by the fuzzer.
> > >
> > > Fix this by tightening the rules for creating groups. Only allow
> > > grouping of counters that can be co-scheduled in the same context.
> > > This means for the same task and/or the same cpu.
> >
> > It seems this would still allow you to group CPU-affine software and
> > uncore events, which also doesn't make sense: the software events will
> > count on a single CPU while the uncore events aren't really CPU-affine.
> >
> > Which isn't anything against this patch, but probably something we
> > should tighten up too.
>
> Indeed, that would need a wee bit of extra infrastructure though; as we
> cannot currently distinguish between regular cpuctx and uncore like
> things.

Isn't the event->pmu->task_ctx_nr sufficient, as with how we identify
software events?

Or am I making some completely bogus assumptions in the diff below?

Mark.

---->8----
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 664de5a..7b945d5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -657,6 +657,15 @@ static inline int is_software_event(struct perf_event *event)
return event->pmu->task_ctx_nr == perf_sw_context;
}

+/*
+ * Return 1 for an event which is associated with neither a particular
+ * CPU nor a particular task.
+ */
+static inline int is_system_event(struct perf_event *event)
+{
+ return event->pmu->task_ctx_nr == perf_invalid_context;
+}
+
extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX];

extern void __perf_sw_event(u32, u64, struct pt_regs *, u64);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2cb857d..50c42b6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7525,6 +7525,18 @@ SYSCALL_DEFINE5(perf_event_open,
account_event(event);

/*
+ * System-wide (A.K.A. "uncore") events cannot be associated with a
+ * particular CPU, and hence cannot be associated with a particular
+ * task either. It's non-sensical to group them with other event types,
+ * which are CPU or task bound.
+ */
+ if (group_leader &&
+ (is_system_event(event) != is_system_event(group_leader))) {
+ err = -EINVAL;
+ goto err_alloc;
+ }
+
+ /*
* Special case software events and allow them to be part of
* any hardware group.
*/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/