Re: [PATCH] perf_events: fix bug in hw_perf_enable()

From: Stephane Eranian
Date: Mon Feb 01 2010 - 11:13:00 EST


On Mon, Feb 1, 2010 at 4:35 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2010-02-01 at 14:50 +0200, Stephane Eranian wrote:
>> Â Â Â We cannot assume that because hwc->idx == assign[i], we
>> Â Â Â can avoid reprogramming the counter in hw_perf_enable().
>>
>> Â Â Â The event may have been scheduled out and another event
>> Â Â Â may have been programmed into this counter. Thus, we need
>> Â Â Â a more robust way of verifying if the counter still
>> Â Â Â contains config/data related to an event.
>>
>> Â Â Â This patch adds a generation number to each counter on each
>> Â Â Â cpu. Using this mechanism we can verify reliabilty whether the
>> Â Â Â content of a counter corresponds to an event.
>>
>> Â Â Â Signed-off-by: Stephane Eranian <eranian@xxxxxxxxxx>
>
> Thanks, got it.
>
> btw, I've also added the below, from what I can make from the docs fixed
> counter 2 is identical to arch perf event 0x013c, as per table A-1 and
> A-7. Both are called CPU_CLK_UNHALTED.REF, except for Core2, where
> 0x013c is called CPU_CLK_UNHALTED.BUS.
>

If you measure 0x013c in a generic counter or in fixed counter 2
it will count the same thing but not at the same rate.
This is true on Core2, Atom, Nehalem, Westmere. The ratio is the
clock/bus ratio.

This goes back to an earlier discussion where I was asking about
the meaning of the generic PMU events and in particular
PERF_COUNT_HW_CPU_CYCLES. Which of the 3 distinct cycle
events (unhalted_core_cycles, unhalted_reference_cycles, bus_cycles)
does not correspond to?



> ---
> Subject: perf_events, x86: Fixup fixed counter constraints
> From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Date: Mon Feb 01 15:36:30 CET 2010
>
> Patch 1da53e0230 ("perf_events, x86: Improve x86 event scheduling")
> lost us one of the fixed purpose counters and then ed8777fc13
> ("perf_events, x86: Fix event constraint masks") broke it even
> further.
>
> Widen the fixed event mask to event+umask and specify the full config
> for each of the 3 fixed purpose counters. Then let the init code fill
> out the placement for the GP regs based on the cpuid info.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> LKML-Reference: <new-submission>
> ---
> Âarch/x86/include/asm/perf_event.h | Â Â2 +-
> Âarch/x86/kernel/cpu/perf_event.c Â| Â 38 ++++++++++++++++++++++++++++++--------
> Â2 files changed, 31 insertions(+), 9 deletions(-)
>
> Index: linux-2.6/arch/x86/include/asm/perf_event.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/perf_event.h
> +++ linux-2.6/arch/x86/include/asm/perf_event.h
> @@ -50,7 +50,7 @@
> Â Â Â Â INTEL_ARCH_INV_MASK| \
> Â Â Â Â INTEL_ARCH_EDGE_MASK|\
> Â Â Â Â INTEL_ARCH_UNIT_MASK|\
> - Â Â Â ÂINTEL_ARCH_EVTSEL_MASK)
> + Â Â Â ÂINTEL_ARCH_EVENT_MASK)
>
> Â#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL Â Â Â Â Â Â Â Â0x3c
> Â#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK Â Â Â Â Â Â Â Â(0x00 << 8)
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
> @@ -243,8 +243,18 @@ static struct event_constraint intel_cor
>
> Âstatic struct event_constraint intel_core2_event_constraints[] =
> Â{
> - Â Â Â FIXED_EVENT_CONSTRAINT(0xc0, (0x3|(1ULL<<32))), /* INSTRUCTIONS_RETIRED */
> - Â Â Â FIXED_EVENT_CONSTRAINT(0x3c, (0x3|(1ULL<<33))), /* UNHALTED_CORE_CYCLES */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x00c0, 1ULL << 32), /* INST_RETIRED.ANY */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x003c, 1ULL << 33), /* CPU_CLK_UNHALTED.CORE */
> + Â Â Â /*
> + Â Â Â Â* FIXED_EVENT_CONSTRAINT(0x013c, 1ULL << 34), ÂCPU_CLK_UNHALTED.REF
> + Â Â Â Â*
> + Â Â Â Â* Core2 has Fixed Counter 2 listed as CPU_CLK_UNHALTED.REF and event
> + Â Â Â Â* 0x013c as CPU_CLK_UNHALTED.BUS and specifies there is a fixed
> + Â Â Â Â* ratio between these counters.
> + Â Â Â Â*
> + Â Â Â Â* TODO: find/measure the fixed ratio and apply it so that we can
> + Â Â Â Â* enable this fixed purpose counter in a transparent way.
> + Â Â Â Â*/
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x10, 0x1), /* FP_COMP_OPS_EXE */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x11, 0x2), /* FP_ASSIST */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x12, 0x2), /* MUL */
> @@ -259,8 +269,9 @@ static struct event_constraint intel_cor
>
> Âstatic struct event_constraint intel_nehalem_event_constraints[] =
> Â{
> - Â Â Â FIXED_EVENT_CONSTRAINT(0xc0, (0xf|(1ULL<<32))), /* INSTRUCTIONS_RETIRED */
> - Â Â Â FIXED_EVENT_CONSTRAINT(0x3c, (0xf|(1ULL<<33))), /* UNHALTED_CORE_CYCLES */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x00c0, 1ULL << 32), /* INST_RETIRED.ANY */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x003c, 1ULL << 33), /* CPU_CLK_UNHALTED.CORE */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x013c, 1ULL << 34), /* CPU_CLK_UNHALTED.REF */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x40, 0x3), /* L1D_CACHE_LD */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x41, 0x3), /* L1D_CACHE_ST */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x42, 0x3), /* L1D_CACHE_LOCK */
> @@ -274,8 +285,9 @@ static struct event_constraint intel_neh
>
> Âstatic struct event_constraint intel_westmere_event_constraints[] =
> Â{
> - Â Â Â FIXED_EVENT_CONSTRAINT(0xc0, (0xf|(1ULL<<32))), /* INSTRUCTIONS_RETIRED */
> - Â Â Â FIXED_EVENT_CONSTRAINT(0x3c, (0xf|(1ULL<<33))), /* UNHALTED_CORE_CYCLES */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x00c0, 1ULL << 32), /* INST_RETIRED.ANY */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x003c, 1ULL << 33), /* CPU_CLK_UNHALTED.CORE */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x013c, 1ULL << 34), /* CPU_CLK_UNHALTED.REF */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x51, 0x3), /* L1D */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x60, 0x1), /* OFFCORE_REQUESTS_OUTSTANDING */
> Â Â Â ÂINTEL_EVENT_CONSTRAINT(0x63, 0x3), /* CACHE_LOCK_CYCLES */
> @@ -284,8 +296,9 @@ static struct event_constraint intel_wes
>
> Âstatic struct event_constraint intel_gen_event_constraints[] =
> Â{
> - Â Â Â FIXED_EVENT_CONSTRAINT(0xc0, (0x3|(1ULL<<32))), /* INSTRUCTIONS_RETIRED */
> - Â Â Â FIXED_EVENT_CONSTRAINT(0x3c, (0x3|(1ULL<<33))), /* UNHALTED_CORE_CYCLES */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x00c0, 1ULL << 32), /* INST_RETIRED.ANY */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x003c, 1ULL << 33), /* CPU_CLK_UNHALTED.CORE */
> + Â Â Â FIXED_EVENT_CONSTRAINT(0x013c, 1ULL << 34), /* CPU_CLK_UNHALTED.REF */
> Â Â Â ÂEVENT_CONSTRAINT_END
> Â};
>
> @@ -2602,6 +2615,7 @@ static void __init pmu_check_apic(void)
>
> Âvoid __init init_hw_perf_events(void)
> Â{
> + Â Â Â struct event_constraint *c;
> Â Â Â Âint err;
>
> Â Â Â Âpr_info("Performance Events: ");
> @@ -2650,6 +2664,14 @@ void __init init_hw_perf_events(void)
> Â Â Â Â Â Â Â Â__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_events) - 1,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0, x86_pmu.num_events);
>
> + Â Â Â for_each_event_constraint(c, x86_pmu.event_constraints) {
> + Â Â Â Â Â Â Â if (c->cmask != INTEL_ARCH_FIXED_MASK)
> + Â Â Â Â Â Â Â Â Â Â Â continue;
> +
> + Â Â Â Â Â Â Â c->idxmsk64[0] |= (1ULL << x86_pmu.num_events) - 1;
> + Â Â Â Â Â Â Â c->weight += x86_pmu.num_events;
> + Â Â Â }
> +
> Â Â Â Âpr_info("... version: Â Â Â Â Â Â Â Â%d\n", Â Â x86_pmu.version);
> Â Â Â Âpr_info("... bit width: Â Â Â Â Â Â Â%d\n", Â Â x86_pmu.event_bits);
> Â Â Â Âpr_info("... generic registers: Â Â Â%d\n", Â Â x86_pmu.num_events);
>
>
>



--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'OpÃra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/