Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus
From: Chris Metcalf
Date: Tue Mar 01 2016 - 11:02:03 EST
(+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)
On 03/01/2016 09:23 AM, Daniel Thompson wrote:
On 29/02/16 21:40, Chris Metcalf wrote:
When doing an nmi backtrace of many cores, and most of them are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and
just emit one line, "NMI backtrace for N skipped: idle".
I can see this makes the logs more attractive but this is code for
emergency situations.
The idle task is responsible for certain power management activities.
How can you be sure the system is wedged because of bugs in that code?
It's a fair point, but as core count increases, you really run the risk
of losing the valuable data in a sea of data that isn't. For example,
for the architecture I maintain, we have the TILE-Gx72, which is a
72-core chip. If each core's register dump and backtrace is 40 lines,
we're up to around 3,000 lines of console output. Filtering that down by
a factor of 10x or more (if only a handful of cores happen to be active,
which is not uncommon) is a substantial usability improvement.
That said, it's true that the original solution I offered (examining
just is_idle_task() plus interrupt nesting) is imprecise. It is
relatively straightforward to add a bit of per-cpu state that is set at
the same moment we currently do stop/start_critical_timings(), which
would indicate much more specifically that the cpu was running the
idling code itself, and not anything more complex. In that case if the
flag was set, you would know you were either sitting on a
processor-specific idle instruction in arch_cpu_idle(), or else polling
one or two memory locations in a tight loop in cpu_idle_poll(), which
presumably would offer sufficient precision to feel safe.
Here's an alternative version of the patch which incorporates this
idea. Do you think this is preferable? Thanks!
commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
Author: Chris Metcalf <cmetcalf@xxxxxxxxxx>
Date: Mon Feb 29 11:56:32 2016 -0500
nmi_backtrace: generate one-line reports for idle cpus
When doing an nmi backtrace of many cores, and most of them are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and
just emit one line, "NMI backtrace for N skipped: idle".
Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 786ad32631a6..b8c3c4cf88ad 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct cpuidle_driver *drv,
/* kernel/sched/idle.c */
extern void sched_idle_set_state(struct cpuidle_state *idle_state);
extern void default_idle_call(void);
+extern bool in_cpu_idle(void);
#ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev, atomic_t *a);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 544a7133cbd1..9aff315f278b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif
+static DEFINE_PER_CPU(bool, cpu_idling);
+
+/* Was the cpu was in the low-level idle code when interrupted? */
+bool in_cpu_idle(void)
+{
+ return this_cpu_read(cpu_idling);
+}
+
static inline int cpu_idle_poll(void)
{
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
local_irq_enable();
stop_critical_timings();
+ this_cpu_write(cpu_idling, true);
while (!tif_need_resched() &&
(cpu_idle_force_poll || tick_check_broadcast_expired()))
cpu_relax();
+ this_cpu_write(cpu_idling, false);
start_critical_timings();
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
rcu_idle_exit();
@@ -89,7 +99,9 @@ void default_idle_call(void)
local_irq_enable();
} else {
stop_critical_timings();
+ this_cpu_write(cpu_idling, true);
arch_cpu_idle();
+ this_cpu_write(cpu_idling, false);
start_critical_timings();
}
}
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..75b5eacaa5d3 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpuidle.h>
#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (in_cpu_idle()) {
+ pr_warn("NMI backtrace for cpu %d skipped: idle\n",
+ cpu);
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);
cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com