Re: [rfc 1/3] perf, x86: P4 PMU - describe config format

From: Stephane Eranian
Date: Fri Nov 26 2010 - 07:46:50 EST


On Fri, Nov 26, 2010 at 12:58 PM, Cyrill Gorcunov <gorcunov@xxxxxxxxxx> wrote:
> On 11/26/10, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>> On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
>> wrote:
>>> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
>>> wrote:
>>>> Stephane, this is a misprint, I'll update this comments on format
>>>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>>>> mode. wait a bit, i'll post update.
>>>>
>>>> On 11/26/10, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>>> ...
>>>>>> + * Â ÂLow 32 bits
>>>>>> + * Â Â-----------
>>>>>> + * Â Â Â0-6: P4_PEBS_METRIC enum
>>>>>> + * Â Â 7-11: Â Â Â Â Â Â Â Â Â Âreserved
>>>>>> + * Â Â Â 12: Active thread
>>>>>
>>>>> I don't understand bit 12. In the actual register, it
>>>>> corresponds to the enable bit. Seems you're overriding
>>>>> its usage. Do I interpret this as saying: 0 = enable when
>>>>> running on thread0, 1=monitoring when running on thread1?
>>>>> And if I don't care?
>>> ...
>>> I believe it simply escaped quilt refresh somehow. Here is the 'refreshed'
>>> copy (note the low bits 12-19 updated here).
>>> ---
>>> perf, x86: P4 PMU - describe config format v2
>>>
>>> Add description of .config in a sake of RAW events.
>>> At least this should bring some light to those who
>>> will be reading this code.
>>>
>>> Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
>>> CC: Lin Ming <ming.m.lin@xxxxxxxxx>
>>> CC: Stephane Eranian <eranian@xxxxxxxxxx>
>>> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>> ---
>>> Âarch/x86/include/asm/perf_event_p4.h | Â 63
>>> ++++++++++++++++++++++++++++++-----
>>> Â1 file changed, 55 insertions(+), 8 deletions(-)
>>>
>>> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>> ===================================================================
>>> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
>>> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>> Â};
>>>
>>> Â/*
>>> - * P4 PEBS specifics (Replay Event only)
>>> - *
>>> - * Format (bits):
>>> - * Â 0-6: metric from P4_PEBS_METRIC enum
>>> - * Â Â7 : reserved
>>> - * Â Â8 : reserved
>>> - * 9-11 : reserved
>>> - *
>>> Â* Note we have UOP and PEBS bits reserved for now
>>> Â* just in case if we will need them once
>>> Â*/
>>> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>>> Â Â Â ÂP4_PEBS_METRIC__max
>>> Â};
>>>
>>> +/*
>>> + * Notes on internal configuration of ESCR+CCCR tuples
>>> + *
>>> + * Since P4 has quite the different architecture of
>>> + * performance registers in compare with "architectural"
>>> + * once and we have on 64 bits to keep configuration
>>> + * of performance event, the following trick is used.
>>> + *
>>> + * 1) Since both ESCR and CCCR registers have only low
>>> + * Â Â32 bits valuable, we pack them into a single 64 bit
>>> + * Â Âconfiguration. Low 32 bits of such config correspond
>>> + * Â Âto low 32 bits of CCCR register and high 32 bits
>>> + * Â Âcorrespond to low 32 bits of ESCR register.
>>> + *
>>> + * 2) The meaning of every bit of such config field can
>>> + * Â Âbe found in Intel SDM but it should be noted that
>>> + * Â Âwe "borrow" some reserved bits for own usage and
>>> + * Â Âclean them or set to a proper value when we do
>>> + * Â Âa real write to hardware registers.
>>> + *
>>> + * 3) The format of bits of config is the following
>>> + * Â Âand should be either 0 or set to some predefined
>>> + * Â Âvalues:
>>> + *
>>> + * Â ÂLow 32 bits
>>> + * Â Â-----------
>>> + * Â Â Â0-6: P4_PEBS_METRIC enum
>>> + * Â Â 7-11: Â Â Â Â Â Â Â Â Â Âreserved
>>> + * Â Â Â 12: Â Â Â Â Â Â Â Â Â Âreserved (Enable)
>>> + * Â Â13-15: Â Â Â Â Â Â Â Â Â Âreserved (ESCR select)
>>> + * Â Â16-17: Active Thread
>>
>> HW has the active thread bits reserved to 0x3.
>> what about you? If not, then explain what they
>> mean.
>>
> hm, not sure i follow, hw allows you to pass any of 4 values for that
> field, so i simply pass it to kernel and then propagate to real cccr
> register. if machine is not ht capable it might be a problem, but i
> left it to a caller to set proper thread value here. i believe that
> you read cccr spec for non-ht machine while ht machine has a bit more
> flags to set.
>
You're right, I missed Figure-30.29. So you honor the field. The counter
won't count anything if the task is not running on the corresponding
HT thread, then.

The only custom fields are then: 0-6, 25-30. I think that's simple enough.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/