Re: I.5 - Mmaped count

From: Peter Zijlstra
Date: Tue Jun 23 2009 - 02:14:02 EST


On Tue, 2009-06-23 at 10:39 +1000, Paul Mackerras wrote:
> Peter Zijlstra writes:
>
> > I think we would have to add that do the data page,.. something like the
> > below?
> >
> > Paulus?
> >
> > ---
> > Index: linux-2.6/include/linux/perf_counter.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/perf_counter.h
> > +++ linux-2.6/include/linux/perf_counter.h
> > @@ -232,6 +232,10 @@ struct perf_counter_mmap_page {
> > __u32 lock; /* seqlock for synchronization */
> > __u32 index; /* hardware counter identifier */
> > __s64 offset; /* add to hardware counter value */
> > + __u64 total_time; /* total time counter active */
> > + __u64 running_time; /* time counter on cpu */
> > +
> > + __u64 __reserved[123]; /* align at 1k */
> >
> > /*
> > * Control data for the mmap() data buffer.
> > Index: linux-2.6/kernel/perf_counter.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/perf_counter.c
> > +++ linux-2.6/kernel/perf_counter.c
> > @@ -1782,6 +1782,12 @@ void perf_counter_update_userpage(struct
> > if (counter->state == PERF_COUNTER_STATE_ACTIVE)
> > userpg->offset -= atomic64_read(&counter->hw.prev_count);
> >
> > + userpg->total_time = counter->total_time_enabled +
> > + atomic64_read(&counter->child_total_time_enabled);
> > +
> > + userpg->running_time = counter->total_time_running +
> > + atomic64_read(&counter->child_total_time_running);
>
> Hmmm, when the counter is running, what you want is not so much the
> total time so far as a way to compute the total time so far from the
> current TSC/timebase value. So we would need to export tstamp_enabled
> and tstamp_running plus a scale/offset for converting the TSC/timebase
> value to nanoseconds consistent with ctx->time. On powerpc that's
> pretty straightforward because the timebases, but on x86 I gather the
> offset and maybe also the scale would need to be per-cpu (which is OK,
> because all the values in the mmapped page are only useful on one
> specific CPU).
>
> How would we compute the scale and offset on x86, given the current
> TSC value and ctx->time?

With pain and suffering ;-)

The userpage would have to provide a multiplier and offset, and we'd
have to register a cpufreq notifier hook and iterate all active counters
and update these mult,offset bits when the cpu freq changes.

An alternative could be to simply ensure we update these timestamps at
least once per the RR interval (tick), that way the times are more or
less recent and could still be used for scaling purposes.

The most important data in these timestamps is their ratio, not their
absolute value, therefore if we keep the ratio statistically significant
we're good enough.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/