Re: [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt

From: Peter Zijlstra
Date: Thu Aug 03 2017 - 10:00:35 EST


On Wed, Aug 02, 2017 at 11:15:39AM +0300, Alexey Budankov wrote:
> +struct perf_event_tstamp {
> + /*
> + * These are timestamps used for computing total_time_enabled
> + * and total_time_running when the event is in INACTIVE or
> + * ACTIVE state, measured in nanoseconds from an arbitrary point
> + * in time.
> + * enabled: the notional time when the event was enabled
> + * running: the notional time when the event was scheduled on
> + * stopped: in INACTIVE state, the notional time when the
> + * event was scheduled off.
> + */
> + u64 enabled;
> + u64 running;
> + u64 stopped;
> +};


So I have the below (untested) patch, also see:

https://lkml.kernel.org/r/20170802171051.zlq5rgx3jqkkxpg7@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

And I don't think I fully agree with your description of running.
Despite its name tstamp_running is not in fact a time stamp afaict. Its
more like an accumulator of running, but with an offset of stopped.

I'm always completely confused by the way this timekeeping is done.

---
Subject: perf: Fix time on IOC_ENABLE
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Thu Aug 3 15:42:09 CEST 2017

Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task
is SIGSTOP'ed state the timestamps go wobbly.

It turns out we indeed fail to correctly account time while in 'OFF'
state and doing IOC_ENABLE without getting scheduled in exposes the
problem.

Further thinking about this problem, it occurred to me that we can
suffer a similar fate when we migrate an uncore event between CPUs.
The perf_event_install() on the 'new' CPU will do add_event_to_ctx()
which will reset all the time stamp, resulting in a subsequent
update_event_times() to overwrite the total_time_* fields with smaller
values.

Reported-by: Vince Weaver <vincent.weaver@xxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/events/core.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e
return can_add_hw;
}

+/*
+ * Complement to update_event_times(). This computes the tstamp_* values to
+ * continue 'enabled' state from @now. And effectively discards the time
+ * between the prior tstamp_stopped and now (as we were in the OFF state, or
+ * just switched (context) time base).
+ *
+ * This further assumes '@event->state == INACTIVE' (we just came from OFF) and
+ * cannot have been scheduled in yet. And going into INACTIVE state means
+ * '@event->tstamp_stopped = @now'.
+ *
+ * Thus given the rules of update_event_times():
+ *
+ * total_time_enabled = tstamp_stopped - tstamp_enabled
+ * total_time_running = tstamp_stopped - tstamp_running
+ *
+ * We can insert 'tstamp_stopped == now' and reverse them to compute new
+ * tstamp_* values.
+ */
+static void __perf_event_enable_time(struct perf_event *event, u64 now)
+{
+ WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE);
+
+ event->tstamp_stopped = now;
+ event->tstamp_enabled = now - event->total_time_enabled;
+ event->tstamp_running = now - event->total_time_running;
+}
+
static void add_event_to_ctx(struct perf_event *event,
struct perf_event_context *ctx)
{
@@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf

list_add_event(event, ctx);
perf_group_attach(event);
- event->tstamp_enabled = tstamp;
- event->tstamp_running = tstamp;
- event->tstamp_stopped = tstamp;
+ __perf_event_enable_time(event, tstamp);
}

static void ctx_sched_out(struct perf_event_context *ctx,
@@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st
u64 tstamp = perf_event_time(event);

event->state = PERF_EVENT_STATE_INACTIVE;
- event->tstamp_enabled = tstamp - event->total_time_enabled;
+ __perf_event_enable_time(event, tstamp);
list_for_each_entry(sub, &event->sibling_list, group_entry) {
+ /* XXX should not be > INACTIVE if event isn't */
if (sub->state >= PERF_EVENT_STATE_INACTIVE)
- sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+ __perf_event_enable_time(sub, tstamp);
}
}