[tip:perf/core] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

From: tip-bot for Andi Kleen
Date: Tue Jun 14 2016 - 07:28:37 EST


Commit-ID: 2c95afc1e83d93fac3be6923465e1753c2c53b0a
Gitweb: http://git.kernel.org/tip/2c95afc1e83d93fac3be6923465e1753c2c53b0a
Author: Andi Kleen <ak@xxxxxxxxxxxxxxx>
AuthorDate: Thu, 9 Jun 2016 06:14:38 -0700
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Tue, 14 Jun 2016 11:16:59 +0200

perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall back to not use groups, which can cause measurement inaccuracy
due to multiplexing errors.

This patch switches the NMI watchdog to use reference cycles
on Intel systems. This is actually more accurate than cycles,
because cycles can tick faster than the measured CPU Frequency
due to Turbo mode.

The ref cycles always tick at their frequency, or slower when
the system is idling. That means the NMI watchdog can never
expire too early, unlike with cycles.

The reference cycles tick roughly at the frequency of the TSC,
so the same period computation can be used.

Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Vince Weaver <vincent.weaver@xxxxxxxxx>
Cc: acme@xxxxxxxxxx
Cc: jolsa@xxxxxxxxxx
Link: http://lkml.kernel.org/r/1465478079-19993-1-git-send-email-andi@xxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/x86/kernel/apic/hw_nmi.c | 8 ++++++++
include/linux/nmi.h | 1 +
kernel/watchdog.c | 7 +++++++
3 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 7788ce6..016f426 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,8 +18,16 @@
#include <linux/nmi.h>
#include <linux/module.h>
#include <linux/delay.h>
+#include <linux/perf_event.h>

#ifdef CONFIG_HARDLOCKUP_DETECTOR
+int hw_nmi_get_event(void)
+{
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+ return PERF_COUNT_HW_REF_CPU_CYCLES;
+ return PERF_COUNT_HW_CPU_CYCLES;
+}
+
u64 hw_nmi_get_sample_period(int watchdog_thresh)
{
return (u64)(cpu_khz) * 1000 * watchdog_thresh;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4630eea..79858af 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -66,6 +66,7 @@ static inline bool trigger_allbutself_cpu_backtrace(void)

#ifdef CONFIG_LOCKUP_DETECTOR
u64 hw_nmi_get_sample_period(int watchdog_thresh);
+int hw_nmi_get_event(void);
extern int nmi_watchdog_enabled;
extern int soft_watchdog_enabled;
extern int watchdog_user_enabled;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9acb29f..8dd30fcd 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -315,6 +315,12 @@ static int is_softlockup(unsigned long touch_ts)

#ifdef CONFIG_HARDLOCKUP_DETECTOR

+/* Can be overriden by architecture */
+__weak int hw_nmi_get_event(void)
+{
+ return PERF_COUNT_HW_CPU_CYCLES;
+}
+
static struct perf_event_attr wd_hw_attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
@@ -604,6 +610,7 @@ static int watchdog_nmi_enable(unsigned int cpu)

wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+ wd_attr->config = hw_nmi_get_event();

/* Try to register using hardware perf events */
event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);