Re: [PATCH] perf_events: fix time tracking in samples
From: Stephane Eranian
Date: Wed Oct 20 2010 - 08:43:07 EST
On Wed, Oct 20, 2010 at 1:13 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, 2010-10-19 at 21:03 +0200, Stephane Eranian wrote:
>
>> >> Ok, I missed that. But I don't understand why you need the lock to
>> >> udpate the time. The lower-level clock is lockless if I recall. Can't you
>> >> use an atomic ops in update_context_time()?
>> >
>> > atomic ops would slow down those code paths, also, I don't think you can
>> > fully get the ordering between ->tstamp_$foo and ->total_time_$foo just
>> > right.
>> >
>>
>> I don't get that. Could you give an example?
>
> Take update_context_time(), it has:
>
> now = perf_clock();
> ctx->time += now - ctx->timestamp;
> ctx->timestamp = now;
>
> If you interleave two of those you get:
>
> Âctx->timestamp = T0;
>
>
> Ânow = perf_clock(); /* T1 */
> Âctx->time += now - ctx->timestamp;
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Ânow = perf_clock(); /* T2 */
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âctx->time += now - ctx->timestamp;
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âctx->timestamp = now;
> Âctx->timestamp = now;
>
>
> So at this point you would expect timestamp = T2 and time += T2-T0.
>
> Except that: time += T1 - T0 + T2 - T0 != T2 - T0 and
> Â Â Â Â Â Â timestamp = T1
>
> You can of course write it as something like x86_perf_event_update(),
> but then there's trying to keep total_time_running and
> total_time_enabled in sync.
>
I think it is more complicated than x86_perf_event_update() because, this
time, you have to update 2 fields, i.e., ctx->time, ctx->timestamp. The
difficulty is to backtrack after you've successfully and atomically set
the first one. You cannot just subtract/replace what you've changed
because it was visible by whoever interrupted you. So it is already
too late.
It may be better to try another approach just for PERF_SAMPLE_READ
with its own version of ctx->time. What about if on event_sched_in() you
were snapshotting ctx->time. Then in the perf_output_read_event(), you'd
have to compute the time delta and add it to this private version of
ctx->time and store that in the sample.
>
>> > Not sure, but barring 64bit atomics for all these, 32bit archs and NMI
>> > are going to be 'interesting'
>> >
>>
>> Every sample needs to be correct, otherwise you run the risk of introducing
>> bias.
>>
>> I think if the tradeoffs is correctness vs. speed, I'd choose correctness.
>
> Well, yes, but it sucks, esp. since its only relevant to
> PERF_SAMPLE_READ.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/