Re: [RFC][PATCH 0/6] perf: x86 RDPMC and RDTSC support

From: Vince Weaver
Date: Fri Dec 23 2011 - 15:12:41 EST


On Wed, 21 Dec 2011, Ingo Molnar wrote:

> Here's "pinned events" variant i've measured:
>
> static u64 mmap_read_self(void *addr)
> {
> struct perf_event_mmap_page *pc = addr;
> u32 seq, idx;
> u64 count;
>
> do {
> seq = pc->lock;
> barrier();
>
> idx = pc->index;
> count = pc->offset;
> if (idx)
> count += rdpmc(idx - 1);
>
> barrier();
> } while (pc->lock != seq);
>
> return count;
> }

currently you need to do at least two rdpmc() calls when doing a
start/read/stop (I use this as a benchmark as it's what PAPI code commonly
does).

This is because the pc->offset value isn't initalized to 0 on start,
but to max_period & cntrval_mask.

I'm not sure what perf_event can do about this short of having a separate
field in the mmap structure that doesn't have the overflow offset
considerations.


As an aside, I notice that the internal perf_event read() routine on x86
seems to use rdmsrl() instead of the equivelent rdpmc(). From what I
understand, at least through core2 (and maybe later) rdpmc() is faster
than the equivelent rdmsr() call. I'm not sure if would be worth
replacing the calls though.

Vince




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/