Re: [DRAFT PATCH 2/3] perf: Implement Nehalem uncore pmu

From: Stephane Eranian
Date: Tue Nov 02 2010 - 12:59:41 EST


On Tue, Nov 2, 2010 at 4:33 PM, Lin Ming <ming.m.lin@xxxxxxxxx> wrote:
> On Tue, 2010-11-02 at 22:53 +0800, Stephane Eranian wrote:
>> Lin,
>>
>> On Tue, Nov 2, 2010 at 8:28 AM, Lin Ming <ming.m.lin@xxxxxxxxx> wrote:
>> > For the background of Nehalem uncore pmu, see Intel SDM Volume 3B
>> > "30.6.2 Performance Monitoring Facility in the Uncore"
>> >
>> > 1. data structure
>> >
>> > struct node_hw_events {
>> > Â Â Â Âstruct perf_event *events[UNCORE_NUM_COUNTERS];
>> > Â Â Â Âint n_events;
>> > Â Â Â Âstruct spinlock lock;
>> > Â Â Â Âint enabled;
>> > };
>> >
>> > struct node_hw_events is the per node structure.
>> > "lock" protects add/delete events to uncore pmu.
>> >
>> > struct uncore_cpu_hw_events {
>> > Â Â Â Âunsigned long active_mask[BITS_TO_LONGS(UNCORE_NUM_COUNTERS)];
>> > };
>> >
>> > struct uncore_cpu_hw_events is the per logical cpu structure.
>> > "active_mask" represents the counters used by the cpu.
>> > For example, if bit 3, 6 are set for cpuX, then it means uncore counter
>> > 3 and 6 are used by cpuX.
>> >
>> I would advise you allocate your uncore_events[] table dynamically
>> using kmalloc_node(). That way you avoid unnecessary remote
>> memory accesses.
>
> Good point. Will do this.
>
>>
>> Furthermore, the patch is missing support for the fixed uncore counter. It is
>> very useful as it allows measuring some reference cycles at the socket
>
> Yes, I'll add fixed uncore counter and possibly also the uncore
> address/opcode match thing.
>
Good.

>> level. You have 8+1 counter total. You need to define some encoding
>> for UNC_CPU_CLK.
>
> Could you explain a bit more? What's the encoding for UNC_CPU_CLK?
>
Well, there is no encoding given it is a fixed counter. This is similar to
the issue that exists today with core PMU and unhalted_reference_cycles
vs unhalted_core_cycles for the fixed counters. Unfortunately, today you
cannot name unhalted_reference_cycles. I'd like to have it because in
many situations it is better than unhalted_core_cycles.

You need to pick an encoding for this event so it can be named and
passed in attr.config.

>> > 2. Uncore pmu NMI handling
>> >
>> > Every core in the socket can be programmed to receive uncore counter
>> > overflow interrupt.
>> >
>> > In this draft implementation, each core handles the overflow interrupt
>> > caused by the counters with bit set in "active_mask".
>> >
>> Seems like in your model, interrupting all cores is the only solution given
>> that you can program uncore events from any cores on the socket.
>
> Do you see some potential problem with this model?
>
You are interrupting other CPUs for potentially nothing. So you
incur some overhead.

But it may also be interesting to snapshot the IP across all CPUs
to determine where they all are. In other words, use uncore PMU
to get a global view of the cores and that's where UNC_CPU_CLK
comes in handy.


> And do you have some ideas about the issue I mentioned in PATCH 0/3?
> Copy it here.
>
> 4. Issues
>
> How to eliminate the duplicate counter values accumulated by multi child
> processes on the same socket?

I think using uncore PMU to measure per-thread is pretty much useless.
Maybe it should not even be allowed. There is no way you can correlate
the counts you're getting to a place in your program. Or put differently,
sampling in per-thread mode using uncore is useless.


Given how perf works in system-wide mode, i.e., automatic aggregation
you'll have the problem regardless of the mode. I think for uncore, you
need to measure from only one core per socket. That's the only way to
get the meaningful counts. You can do this explicitly using the -C mode.
I believe the tool should print a warning or refuse to do it if you just
pass -a (system-wide mode).

>
> perf stat -e ru0101 -- make -j4
>
> Assume the 4 "make" child processes are running on the same socket and
> counting uncore raw event "0101", and the counter value read by them are
> val0, val1, val2, val3.
>
> Then the final counter result given by "perf stat" will be "val0 + val1
> + val2 + val3".
>
> But this is obvious wrong, because the uncore counter is shared by all
> cores in the socket, so the final result should not be accumulated.
>
> Many thanks,
> Lin Ming
>
>>
>>
>> > ---
>> > Âarch/x86/include/asm/msr-index.h       Â|  Â1 +
>> > Âarch/x86/kernel/cpu/perf_event.c       Â|  Â7 +-
>> > Âarch/x86/kernel/cpu/perf_event_intel_uncore.c | Â280 +++++++++++++++++++++++++
>> > Âarch/x86/kernel/cpu/perf_event_intel_uncore.h | Â 80 +++++++
>> > Â4 files changed, 367 insertions(+), 1 deletions(-)
>> > Âcreate mode 100644 arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> > Âcreate mode 100644 arch/x86/kernel/cpu/perf_event_intel_uncore.h
>> >
>> > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> > index 3ea3dc4..816fb4b 100644
>> > --- a/arch/x86/include/asm/msr-index.h
>> > +++ b/arch/x86/include/asm/msr-index.h
>> > @@ -81,6 +81,7 @@
>> > Â#define DEBUGCTLMSR_BTS_OFF_OS Â Â Â Â (1UL << Â9)
>> > Â#define DEBUGCTLMSR_BTS_OFF_USR Â Â Â Â Â Â Â Â(1UL << 10)
>> > Â#define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI (1UL << 11)
>> > +#define DEBUGCTLMSR_ENABLE_UNCORE_PMI Â(1UL << 13)
>> >
>> > Â#define MSR_IA32_MC0_CTL Â Â Â Â Â Â Â 0x00000400
>> > Â#define MSR_IA32_MC0_STATUS Â Â Â Â Â Â0x00000401
>> > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
>> > index 7cea0f4..cca07b4 100644
>> > --- a/arch/x86/kernel/cpu/perf_event.c
>> > +++ b/arch/x86/kernel/cpu/perf_event.c
>> > @@ -1215,6 +1215,8 @@ struct pmu_nmi_state {
>> >
>> > Âstatic DEFINE_PER_CPU(struct pmu_nmi_state, pmu_nmi);
>> >
>> > +static int uncore_pmu_handle_irq(struct pt_regs *regs);
>> > +
>> > Âstatic int __kprobes
>> > Âperf_event_nmi_handler(struct notifier_block *self,
>> > Â Â Â Â Â Â Â Â Â Â Â Â unsigned long cmd, void *__args)
>> > @@ -1249,7 +1251,8 @@ perf_event_nmi_handler(struct notifier_block *self,
>> >
>> > Â Â Â Âapic_write(APIC_LVTPC, APIC_DM_NMI);
>> >
>> > - Â Â Â handled = x86_pmu.handle_irq(args->regs);
>> > + Â Â Â handled = uncore_pmu_handle_irq(args->regs);
>> > + Â Â Â handled += x86_pmu.handle_irq(args->regs);
>> > Â Â Â Âif (!handled)
>> > Â Â Â Â Â Â Â Âreturn NOTIFY_DONE;
>> >
>> > @@ -1305,6 +1308,7 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
>> > Â#include "perf_event_intel_lbr.c"
>> > Â#include "perf_event_intel_ds.c"
>> > Â#include "perf_event_intel.c"
>> > +#include "perf_event_intel_uncore.c"
>> >
>> > Âstatic int __cpuinit
>> > Âx86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
>> > @@ -1360,6 +1364,7 @@ void __init init_hw_perf_events(void)
>> >
>> > Â Â Â Âswitch (boot_cpu_data.x86_vendor) {
>> > Â Â Â Âcase X86_VENDOR_INTEL:
>> > + Â Â Â Â Â Â Â init_uncore_pmu();
>> > Â Â Â Â Â Â Â Âerr = intel_pmu_init();
>> > Â Â Â Â Â Â Â Âbreak;
>> > Â Â Â Âcase X86_VENDOR_AMD:
>> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> > new file mode 100644
>> > index 0000000..fafa0f9
>> > --- /dev/null
>> > +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> > @@ -0,0 +1,280 @@
>> > +#include "perf_event_intel_uncore.h"
>> > +
>> > +static struct node_hw_events uncore_events[MAX_NUMNODES];
>> > +static DEFINE_PER_CPU(struct uncore_cpu_hw_events, uncore_cpu_hw_events);
>> > +static bool uncore_pmu_initialized;
>> > +
>> > +static void uncore_pmu_enable_event(struct perf_event *event)
>> > +{
>> > + Â Â Â struct hw_perf_event *hwc = &event->hw;
>> > +
>> > + Â Â Â wrmsrl(hwc->config_base + hwc->idx, hwc->config | UNCORE_EVENTSEL_ENABLE);
>> > +}
>> > +
>> > +static void uncore_pmu_disable_event(struct perf_event *event)
>> > +{
>> > + Â Â Â struct hw_perf_event *hwc = &event->hw;
>> > +
>> > + Â Â Â wrmsrl(hwc->config_base + hwc->idx, hwc->config);
>> > +}
>> > +
>> > +static void uncore_pmu_disable_events(void)
>> > +{
>> > + Â Â Â struct uncore_cpu_hw_events *cpuc = &__get_cpu_var(uncore_cpu_hw_events);
>> > + Â Â Â int node = numa_node_id();
>> > + Â Â Â int bit;
>> > +
>> > + Â Â Â for_each_set_bit(bit, cpuc->active_mask, UNCORE_NUM_COUNTERS)
>> > + Â Â Â Â Â Â Â uncore_pmu_disable_event(uncore_events[node].events[bit]);
>> > +}
>> > +
>> > +static void uncore_pmu_enable_events(void)
>> > +{
>> > + Â Â Â struct uncore_cpu_hw_events *cpuc = &__get_cpu_var(uncore_cpu_hw_events);
>> > + Â Â Â int node = numa_node_id();
>> > + Â Â Â int bit;
>> > +
>> > + Â Â Â for_each_set_bit(bit, cpuc->active_mask, UNCORE_NUM_COUNTERS)
>> > + Â Â Â Â Â Â Â uncore_pmu_disable_event(uncore_events[node].events[bit]);
>> > +}
>> > +
>> > +static void uncore_pmu_global_enable(void)
>> > +{
>> > + Â Â Â u64 ctrl;
>> > +
>> > + Â Â Â /* (0xFULL << 48): all 4 cores will receive NMI */
>> > + Â Â Â ctrl = ((1 << UNCORE_NUM_COUNTERS) - 1) | (0xFULL << 48);
>> > +
>> > + Â Â Â wrmsrl(MSR_UNCORE_PERF_GLOBAL_CTRL, ctrl);
>> > +}
>> > +
>> > +static void uncore_perf_event_destroy(struct perf_event *event)
>> > +{
>> > + Â Â Â atomic_dec(&active_events);
>> > +}
>> > +
>> > +static int uncore_pmu_event_init(struct perf_event *event)
>> > +{
>> > + Â Â Â struct hw_perf_event *hwc = &event->hw;
>> > +
>> > + Â Â Â if (!uncore_pmu_initialized)
>> > + Â Â Â Â Â Â Â return -ENOENT;
>> > +
>> > + Â Â Â switch (event->attr.type) {
>> > + Â Â Â case PERF_TYPE_UNCORE:
>> > + Â Â Â Â Â Â Â break;
>> > +
>> > + Â Â Â default:
>> > + Â Â Â Â Â Â Â return -ENOENT;
>> > + Â Â Â }
>> > +
>> > + Â Â Â atomic_inc(&active_events);
>> > +
>> > + Â Â Â event->destroy = uncore_perf_event_destroy;
>> > +
>> > + Â Â Â hwc->idx = -1;
>> > + Â Â Â hwc->config = (event->attr.config & UNCORE_RAW_EVENT_MASK) | UNCORE_EVENTSEL_PMI;
>> > + Â Â Â hwc->config_base = MSR_UNCORE_PERFEVTSEL0;
>> > + Â Â Â hwc->event_base = MSR_UNCORE_PMC0;
>> > +
>> > + Â Â Â return 0;
>> > +}
>> > +
>> > +static void uncore_pmu_start(struct perf_event *event, int flags)
>> > +{
>> > + Â Â Â if (flags & PERF_EF_RELOAD)
>> > + Â Â Â Â Â Â Â x86_perf_event_set_period(event);
>> > +
>> > + Â Â Â uncore_pmu_enable_event(event);
>> > +
>> > + Â Â Â perf_event_update_userpage(event);
>> > +}
>> > +
>> > +static void uncore_pmu_stop(struct perf_event *event, int flags)
>> > +{
>> > + Â Â Â struct uncore_cpu_hw_events *cpuc = &__get_cpu_var(uncore_cpu_hw_events);
>> > + Â Â Â struct hw_perf_event *hwc = &event->hw;
>> > +
>> > + Â Â Â if (__test_and_clear_bit(hwc->idx, cpuc->active_mask))
>> > + Â Â Â Â Â Â Â uncore_pmu_disable_event(event);
>> > +
>> > + Â Â Â if (flags & PERF_EF_UPDATE)
>> > + Â Â Â Â Â Â Â x86_perf_event_update(event, UNCORE_CNTVAL_BITS);
>> > +}
>> > +
>> > +static int uncore_pmu_add(struct perf_event *event, int flags)
>> > +{
>> > + Â Â Â struct uncore_cpu_hw_events *cpuc = &__get_cpu_var(uncore_cpu_hw_events);
>> > + Â Â Â int node = numa_node_id();
>> > + Â Â Â int ret = 1;
>> > + Â Â Â int i;
>> > + Â Â Â u64 ctrl;
>> > +
>> > + Â Â Â spin_lock(&uncore_events[node].lock);
>> > +
>> > + Â Â Â for (i = 0; i < UNCORE_NUM_COUNTERS; i++) {
>> > + Â Â Â Â Â Â Â if (!uncore_events[node].events[i]) {
>> > + Â Â Â Â Â Â Â Â Â Â Â uncore_events[node].events[i] = event;
>> > + Â Â Â Â Â Â Â Â Â Â Â uncore_events[node].n_events++;
>> > +
>> > + Â Â Â Â Â Â Â Â Â Â Â event->hw.idx = i;
>> > + Â Â Â Â Â Â Â Â Â Â Â __set_bit(i, cpuc->active_mask);
>> > + Â Â Â Â Â Â Â Â Â Â Â if (flags & PERF_EF_START)
>> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â uncore_pmu_start(event, PERF_EF_RELOAD);
>> > + Â Â Â Â Â Â Â Â Â Â Â ret = 0;
>> > + Â Â Â Â Â Â Â Â Â Â Â break;
>> > + Â Â Â Â Â Â Â }
>> > + Â Â Â }
>> > +
>> > + Â Â Â /*
>> > + Â Â Â Â* PMI delivery due to an uncore counter overflow is enabled by
>> > + Â Â Â Â* setting IA32_DEBUG_CTL.Offcore_PMI_EN to 1.
>> > + Â Â Â Â*/
>> > + Â Â Â if (uncore_events[node].n_events == 1) {
>> > + Â Â Â Â Â Â Â rdmsrl(MSR_IA32_DEBUGCTLMSR, ctrl);
>> > + Â Â Â Â Â Â Â wrmsrl(MSR_IA32_DEBUGCTLMSR, ctrl | DEBUGCTLMSR_ENABLE_UNCORE_PMI);
>> > + Â Â Â }
>> > +
>> > + Â Â Â if (unlikely(!uncore_events[node].enabled)) {
>> > + Â Â Â Â Â Â Â uncore_pmu_global_enable();
>> > + Â Â Â Â Â Â Â uncore_events[node].enabled = 1;
>> > + Â Â Â }
>> > +
>> > + Â Â Â spin_unlock(&uncore_events[node].lock);
>> > +
>> > + Â Â Â return ret;
>> > +}
>> > +
>> > +static void uncore_pmu_del(struct perf_event *event, int flags)
>> > +{
>> > + Â Â Â int node = numa_node_id();
>> > + Â Â Â struct hw_perf_event *hwc = &event->hw;
>> > + Â Â Â int i;
>> > +
>> > + Â Â Â spin_lock(&uncore_events[node].lock);
>> > +
>> > + Â Â Â for (i = 0; i < UNCORE_NUM_COUNTERS; i++) {
>> > + Â Â Â Â Â Â Â if (uncore_events[node].events[i] == event) {
>> > + Â Â Â Â Â Â Â Â Â Â Â uncore_events[node].events[hwc->idx] = NULL;
>> > + Â Â Â Â Â Â Â Â Â Â Â uncore_events[node].n_events--;
>> > +
>> > + Â Â Â Â Â Â Â Â Â Â Â uncore_pmu_stop(event, PERF_EF_UPDATE);
>> > + Â Â Â Â Â Â Â Â Â Â Â break;
>> > + Â Â Â Â Â Â Â }
>> > + Â Â Â }
>> > +
>> > + Â Â Â spin_unlock(&uncore_events[node].lock);
>> > +}
>> > +
>> > +static void uncore_pmu_read(struct perf_event *event)
>> > +{
>> > + Â Â Â x86_perf_event_update(event, UNCORE_CNTVAL_BITS);
>> > +}
>> > +
>> > +static struct pmu uncore_pmu = {
>> > +    .event_init   = uncore_pmu_event_init,
>> > +    .add      Â= uncore_pmu_add,
>> > +    .del      Â= uncore_pmu_del,
>> > +    .start     Â= uncore_pmu_start,
>> > +    .stop      = uncore_pmu_stop,
>> > +    .read      = uncore_pmu_read,
>> > +};
>> > +
>> > +
>> > +static inline u64 uncore_pmu_get_status(void)
>> > +{
>> > + Â Â Â struct uncore_cpu_hw_events *cpuc = &__get_cpu_var(uncore_cpu_hw_events);
>> > + Â Â Â u64 status;
>> > +
>> > + Â Â Â rdmsrl(MSR_UNCORE_PERF_GLOBAL_STATUS, status);
>> > +
>> > + Â Â Â return status & (*(u64 *)cpuc->active_mask |
>> > + Â Â Â Â Â Â Â MSR_UNCORE_PERF_GLOBAL_STATUS_OVF_PMI | MSR_UNCORE_PERF_GLOBAL_STATUS_CHG);
>> > +}
>> > +
>> > +static inline void uncore_pmu_ack_status(u64 ack)
>> > +{
>> > + Â Â Â wrmsrl(MSR_UNCORE_PERF_GLOBAL_OVF_CTRL, ack);
>> > +}
>> > +
>> > +static int uncore_pmu_save_and_restart(struct perf_event *event)
>> > +{
>> > + Â Â Â x86_perf_event_update(event, UNCORE_CNTVAL_BITS);
>> > + Â Â Â return x86_perf_event_set_period(event);
>> > +}
>> > +
>> > +int uncore_pmu_handle_irq(struct pt_regs *regs)
>> > +{
>> > + Â Â Â struct perf_sample_data data;
>> > + Â Â Â struct node_hw_events *uncore_node;
>> > + Â Â Â int node;
>> > + Â Â Â int bit;
>> > + Â Â Â u64 status;
>> > + Â Â Â int handled = 0;
>> > +
>> > + Â Â Â perf_sample_data_init(&data, 0);
>> > +
>> > + Â Â Â node = numa_node_id();
>> > + Â Â Â uncore_node = &uncore_events[node];
>> > +
>> > + Â Â Â status = uncore_pmu_get_status();
>> > + Â Â Â if (!status) {
>> > + Â Â Â Â Â Â Â apic_write(APIC_LVTPC, APIC_DM_NMI);
>> > +
>> > + Â Â Â Â Â Â Â return 1;
>> > + Â Â Â }
>> > +
>> > + Â Â Â uncore_pmu_disable_events();
>> > +again:
>> > + Â Â Â uncore_pmu_ack_status(status);
>> > +
>> > + Â Â Â for_each_set_bit(bit, (unsigned long *)&status, UNCORE_NUM_COUNTERS) {
>> > + Â Â Â Â Â Â Â struct perf_event *event = uncore_node->events[bit];
>> > +
>> > + Â Â Â Â Â Â Â handled++;
>> > +
>> > + Â Â Â Â Â Â Â if (!uncore_pmu_save_and_restart(event))
>> > + Â Â Â Â Â Â Â Â Â Â Â continue;
>> > +
>> > + Â Â Â Â Â Â Â data.period = event->hw.last_period;
>> > +
>> > + Â Â Â Â Â Â Â if (perf_event_overflow(event, 1, &data, regs))
>> > + Â Â Â Â Â Â Â Â Â Â Â uncore_pmu_stop(event, 0);
>> > + Â Â Â }
>> > +
>> > + Â Â Â /*
>> > + Â Â Â Â* Repeat if there is more work to be done:
>> > + Â Â Â Â*/
>> > + Â Â Â status = uncore_pmu_get_status();
>> > + Â Â Â if (status)
>> > + Â Â Â Â Â Â Â goto again;
>> > +
>> > + Â Â Â uncore_pmu_enable_events();
>> > + Â Â Â return handled;
>> > +}
>> > +
>> > +void __init init_uncore_pmu(void)
>> > +{
>> > + Â Â Â union cpuid01_eax eax;
>> > + Â Â Â unsigned int unused;
>> > + Â Â Â unsigned int model;
>> > + Â Â Â int i;
>> > +
>> > + Â Â Â if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
>> > + Â Â Â Â Â Â Â return;
>> > +
>> > + Â Â Â cpuid(1, &eax.full, &unused, &unused, &unused);
>> > +
>> > + Â Â Â /* Check CPUID signatures: 06_1AH, 06_1EH, 06_1FH */
>> > + Â Â Â model = eax.split.model | (eax.split.ext_model << 4);
>> > + Â Â Â if (eax.split.family != 6 || (model != 0x1A && model != 0x1E && model != 0x1F))
>> > + Â Â Â Â Â Â Â return;
>> > +
>> > + Â Â Â pr_cont("Nehalem uncore pmu, \n");
>> > +
>> > + Â Â Â for (i = 0; i < MAX_NUMNODES; i++)
>> > + Â Â Â Â Â Â Â spin_lock_init(&uncore_events[i].lock);
>> > +
>> > + Â Â Â perf_pmu_register(&uncore_pmu);
>> > + Â Â Â uncore_pmu_initialized = true;
>> > +}
>> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
>> > new file mode 100644
>> > index 0000000..33b9b5e
>> > --- /dev/null
>> > +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
>> > @@ -0,0 +1,80 @@
>> > +#include <linux/perf_event.h>
>> > +#include <linux/capability.h>
>> > +#include <linux/notifier.h>
>> > +#include <linux/hardirq.h>
>> > +#include <linux/kprobes.h>
>> > +#include <linux/module.h>
>> > +#include <linux/kdebug.h>
>> > +#include <linux/sched.h>
>> > +#include <linux/uaccess.h>
>> > +#include <linux/slab.h>
>> > +#include <linux/highmem.h>
>> > +#include <linux/cpu.h>
>> > +#include <linux/bitops.h>
>> > +
>> > +#include <asm/apic.h>
>> > +#include <asm/stacktrace.h>
>> > +#include <asm/nmi.h>
>> > +#include <asm/compat.h>
>> > +
>> > +#define MSR_UNCORE_PERF_GLOBAL_CTRL Â Â0x391
>> > +#define MSR_UNCORE_PERF_GLOBAL_STATUS Â0x392
>> > +#define MSR_UNCORE_PERF_GLOBAL_OVF_CTRL Â Â Â Â0x393
>> > +#define MSR_UNCORE_FIXED_CTR0 Â Â Â Â Â0x394
>> > +#define MSR_UNCORE_FIXED_CTR_CTRL Â Â Â0x395
>> > +#define MSR_UNCORE_ADDR_OPCODE_MATCH Â 0x396
>> > +
>> > +#define MSR_UNCORE_PERF_GLOBAL_CTRL_PMI_CORE0 (1ULL << 48)
>> > +#define MSR_UNCORE_PERF_GLOBAL_CTRL_PMI_FRZ (1ULL << 63)
>> > +
>> > +#define MSR_UNCORE_PERF_GLOBAL_STATUS_OVF_PMI Â(1ULL << 61)
>> > +#define MSR_UNCORE_PERF_GLOBAL_STATUS_CHG Â Â Â (1ULL << 63)
>> > +
>> > +#define MSR_UNCORE_PMC0 Â Â Â Â Â Â Â Â Â Â Â Â0x3b0
>> > +
>> > +#define MSR_UNCORE_PERFEVTSEL0 Â Â Â Â 0x3c0
>> > +
>> > +#define UNCORE_EVENTSEL_EVENT Â Â Â Â Â Â Â Â Â0x000000FFULL
>> > +#define UNCORE_EVENTSEL_UMASK Â Â Â Â Â Â Â Â Â0x0000FF00ULL
>> > +#define UNCORE_EVENTSEL_OCC_CTR_RST Â Â Â Â Â Â(1ULL << 17)
>> > +#define UNCORE_EVENTSEL_EDGE Â Â Â Â Â Â Â Â Â (1ULL << 18)
>> > +#define UNCORE_EVENTSEL_PMI Â Â Â Â Â Â Â Â Â Â(1ULL << 20)
>> > +#define UNCORE_EVENTSEL_ENABLE Â Â Â Â Â Â Â Â (1ULL << 22)
>> > +#define UNCORE_EVENTSEL_INV Â Â Â Â Â Â Â Â Â Â(1ULL << 23)
>> > +#define UNCORE_EVENTSEL_CMASK Â Â Â Â Â Â Â Â Â0xFF000000ULL
>> > +
>> > +#define UNCORE_RAW_EVENT_MASK Â Â Â Â Â\
>> > + Â Â Â (UNCORE_EVENTSEL_EVENT | Â Â Â Â\
>> > + Â Â Â ÂUNCORE_EVENTSEL_UMASK | Â Â Â Â\
>> > + Â Â Â ÂUNCORE_EVENTSEL_EDGE Â| Â Â Â Â\
>> > + Â Â Â ÂUNCORE_EVENTSEL_INV Â | Â Â Â Â\
>> > + Â Â Â ÂUNCORE_EVENTSEL_CMASK)
>> > +
>> > +#define UNCORE_CNTVAL_BITS Â Â 48
>> > +
>> > +#define UNCORE_NUM_COUNTERS 8
>> > +
>> > +union cpuid01_eax {
>> > + Â Â Â struct {
>> > + Â Â Â Â Â Â Â unsigned int stepping:4;
>> > + Â Â Â Â Â Â Â unsigned int model:4;
>> > + Â Â Â Â Â Â Â unsigned int family:4;
>> > + Â Â Â Â Â Â Â unsigned int type:2;
>> > + Â Â Â Â Â Â Â unsigned int reserve:2;
>> > + Â Â Â Â Â Â Â unsigned int ext_model:4;
>> > + Â Â Â Â Â Â Â unsigned int ext_family:4;
>> > + Â Â Â } split;
>> > + Â Â Â unsigned int full;
>> > +};
>> > +
>> > +struct node_hw_events {
>> > + Â Â Â struct perf_event *events[UNCORE_NUM_COUNTERS]; /* in counter order */
>> > + Â Â Â int n_events;
>> > + Â Â Â struct spinlock lock;
>> > + Â Â Â int enabled;
>> > +};
>> > +
>> > +struct uncore_cpu_hw_events {
>> > + Â Â Â unsigned long active_mask[BITS_TO_LONGS(UNCORE_NUM_COUNTERS)];
>> > +};
>> > +
>> > --
>> > 1.7.1
>> >
>> >
>> >
>> >
>> >
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/