Re: [RFC][PATCH] perf_events, x86: PEBS support

From: Stephane Eranian
Date: Wed Feb 03 2010 - 09:55:03 EST


On Wed, Feb 3, 2010 at 3:40 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2010-02-03 at 15:30 +0100, Stephane Eranian wrote:
>> On Wed, Feb 3, 2010 at 3:19 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Wed, 2010-02-03 at 15:07 +0100, Stephane Eranian wrote:
>> >> >> The only improvement that PEBS provides is that you get an IP and the
>> >> >> machine state at retirement of an instruction that caused the event to
>> >> >> increment. Thus, the IP points to the next dynamic instruction. The instruction
>> >> >> is not the one that cause the P-th occurence of the event, if you set the
>> >> >> period to P. It is at P+N, where N cannot be predicted and varies depending
>> >> >> on the event and executed code. This introduces some bias in the samples..
>> >> >
>> >> > I'm not sure I follow, it records the next event after overflow, doesn't
>> >> > that make it P+1?
>> >> >
>> >> That is not what I wrote. I did not say if records at P+1. I said it records
>> >> at P+N, where N varies from sample to sample and cannot be predicted.
>> >> N is expressed in the unit of the sampling event.
>> >
>> > OK, so I'm confused.
>> >
>> > The manual says it arms the PEBS assist on overflow, and the PEBS thing
>> > will then record the next event. Which to me reads like P+1.
>> >
>> you are assuming arming is instantaneous.
>
> Yes I was, ok that stinks.
>
PEBS is still very useful because it guarantees the state you capture
is at retirement of an instruction which caused the event.

PEBS also gets way more interesting on Nehalem because of the
ability to capture where cache misses occur. That's the load latency
feature. You need to support that.

I believe you would need to abstract this in a generic fashion so it
could be used on other architectures, such as AMD with IBS.

On Nehalem, it requires the following:

- only works if you sample on MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD.

- the threshold must be programmed into a dedicated MSR. The extra
difficulty is that this MSR is shared between CPU when HT is on.


> If only they would reset the counter on overflow instead of on record,
> that would solve quite a few issues I imagine.
>
> Then add IP to the actual instruction and you've got yourself a useful
> tool :-)
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/