Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)

From: Peter Zijlstra
Date: Wed Feb 17 2016 - 05:28:33 EST


On Wed, Feb 17, 2016 at 10:35:39AM +0100, Jiri Olsa wrote:
> On Wed, Feb 17, 2016 at 04:34:16AM +0100, Oleg Nesterov wrote:
> > Finally I reproduced... let me add CC's and reply to initial message. This has
> > nothing to do with arm/uprobes.
> >
> > I simply can't understand how perf calculates ->total_time_enabled/running.
> > At all. But the problem is that
> >
> > 1. perf_event_enable_on_exec() does enable first, then event_sched_in().
> >
> > After that tstamp_enabled < tstamp_running
> >
> > 2. This means that after the next update_event_times()
> > total_time_running < total_time_enabled
> >
> > again, I fail to understand these calculations, but this is what
> > perf_event_read_value() reports to user-space.
> >
> > 3. /usr/bin/perf calls perf_counts_values__scale() which does
> >
> > else if (count->run < count->ena) {
> > scaled = 1;
> > count->val = (u64)((double) count->val * count->ena / count->run + 0.5);
> > }
> >
> > and this is why you see the wrong number. count->val was correct but
> > wrongly updated because total_time_running < total_time_enabled.
> >
> > I leave this to Peter and Jiri ;)
>
> I did not notice the ther conversation wasn't public, reposting ;-)
>
> jirka
>
> ---
> ouch, I tested with fedora kernel.. I can reproduce with 4.5
>
> Pratyush bisected this into following commit:
> [3e349507d12de93b08b0aa814fc2aa0dee91c5ba] perf: Fix perf_enable_on_exec() event scheduling
>
> it seems the commit above introduced unwanted difference between
> counter's enabled and running times.. I'm checking on that ;-)

Does something like so work?

---

So prior to 3e349507d12d ("perf: Fix perf_enable_on_exec() event
scheduling") we used to call task_ctx_sched_out() before
event_enable_on_exec().

ctx_sched_out() will call update_context_time(), therefore
__perf_event_mark_enabled() would have an up-to-date ctx->time.

Now, not so much. So explicitly update the ctx time before calling
event_enable_on_exec().

ctx_resched() will again call update_context_time(), resulting in a
slight difference the other way (running > enabled), which doesn't make
any sense either, but that we can (and should) clip.


--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in

cpuctx = __get_cpu_context(ctx);
perf_ctx_lock(cpuctx, ctx);
+
+ update_context_time(ctx);
+ update_cgrp_time_from_cpuctx(cpuctx);
+
list_for_each_entry(event, &ctx->event_list, event_entry)
enabled |= event_enable_on_exec(event, ctx);