Re: [patch] perf wrong enabled time after attach/enable/enable

From: Peter Zijlstra
Date: Fri Aug 04 2017 - 07:23:06 EST


On Thu, Aug 03, 2017 at 02:07:53PM -0400, Vince Weaver wrote:
> On Wed, 2 Aug 2017, Peter Zijlstra wrote:
>
> > Playing with that test it really is the IOC_DISABLE while STOP'ed that
> > messes things up.
> >
> > Ah.. magic.. the below seems to fix things, hopefully it doesn't break
> > anything else.
>
> yes, I've tested this and it seems to fix things.

New version below.. is a bit different and should handle another case.
I'll try and run more tests on it, because while it _SHOULD_ not trigger
that WARN_ON_ONCE() you know how that goes..

> With both this and the other patch applied PAPI can finally pass all of
> its validation tests when running with RDPMC support enabled.

I get:

rdpmc# (for i in *.c ; do ./${i%.c}; done) | grep "^Testing"
Testing if rdpmc attach works... PASSED
Testing if minimized rdpmc papi-multi-attach works... PASSED
Testing if we can rdpmc in execed process... PASSED
Testing if we can rdpmc in execed process (PAPI) ... PASSED
Testing if rdpmc works with event groups... PASSED
-bash: ./rdpmc_lib: No such file or directory
Testing if rdpmc multi-attach works... PASSED
Testing if rdpmc papi-multi-attach works... PASSED
Testing if userspace rdpmc multiplexing works... PASSED
Testing if rdpmc with pthreads works... PASSED
Testing if rdpmc with pthreads works... PASSED
Testing if resetting while using rdpmc works... PASSED
Testing if userspace rdpmc reads are supported... NEW BEHAVIOR
Testing if rdpmc fallback works on sw events... PASSED
Testing if userspace rdpmc reads give expected results... PASSED

is that 'NEW BEHAVIOR' thing something I should worry about?

Full output for that test is:

rdpmc# ./rdpmc_support
This test checks if userspace rdpmc() style reads work.

total start/read/stop latency: 52668 cycles
Event 0 -- count: 63 running: 0
Testing if userspace rdpmc reads are supported... NEW BEHAVIOR

Unpatched kernel also does that.

---
Subject: perf: Fix time on IOC_ENABLE
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Thu Aug 3 15:42:09 CEST 2017

Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task
is SIGSTOP'ed state the timestamps go wobbly.

It turns out we indeed fail to correctly account time while in 'OFF'
state and doing IOC_ENABLE without getting scheduled in exposes the
problem.

Further thinking about this problem, it occurred to me that we can
suffer a similar fate when we migrate an uncore event between CPUs.
The perf_event_install() on the 'new' CPU will do add_event_to_ctx()
which will reset all the time stamp, resulting in a subsequent
update_event_times() to overwrite the total_time_* fields with smaller
values.

Reported-by: Vince Weaver <vincent.weaver@xxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/events/core.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e
return can_add_hw;
}

+/*
+ * Complement to update_event_times(). This computes the tstamp_* values to
+ * continue 'enabled' state from @now. And effectively discards the time
+ * between the prior tstamp_stopped and now (as we were in the OFF state, or
+ * just switched (context) time base).
+ *
+ * This further assumes '@event->state == INACTIVE' (we just came from OFF) and
+ * cannot have been scheduled in yet. And going into INACTIVE state means
+ * '@event->tstamp_stopped = @now'.
+ *
+ * Thus given the rules of update_event_times():
+ *
+ * total_time_enabled = tstamp_stopped - tstamp_enabled
+ * total_time_running = tstamp_stopped - tstamp_running
+ *
+ * We can insert 'tstamp_stopped == now' and reverse them to compute new
+ * tstamp_* values.
+ */
+static void __perf_event_enable_time(struct perf_event *event, u64 now)
+{
+ WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE);
+
+ event->tstamp_stopped = now;
+ event->tstamp_enabled = now - event->total_time_enabled;
+ event->tstamp_running = now - event->total_time_running;
+}
+
static void add_event_to_ctx(struct perf_event *event,
struct perf_event_context *ctx)
{
@@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf

list_add_event(event, ctx);
perf_group_attach(event);
- event->tstamp_enabled = tstamp;
- event->tstamp_running = tstamp;
- event->tstamp_stopped = tstamp;
+ __perf_event_enable_time(event, tstamp);
}

static void ctx_sched_out(struct perf_event_context *ctx,
@@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st
u64 tstamp = perf_event_time(event);

event->state = PERF_EVENT_STATE_INACTIVE;
- event->tstamp_enabled = tstamp - event->total_time_enabled;
+ __perf_event_enable_time(event, tstamp);
list_for_each_entry(sub, &event->sibling_list, group_entry) {
+ /* XXX should not be > INACTIVE if event isn't */
if (sub->state >= PERF_EVENT_STATE_INACTIVE)
- sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+ __perf_event_enable_time(sub, tstamp);
}
}