Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

From: Daniel Thompson
Date: Mon Mar 07 2016 - 03:26:59 EST


On 01/03/16 23:01, Chris Metcalf wrote:
(+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)

On 03/01/2016 09:23 AM, Daniel Thompson wrote:
On 29/02/16 21:40, Chris Metcalf wrote:
When doing an nmi backtrace of many cores, and most of them are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and
just emit one line, "NMI backtrace for N skipped: idle".

I can see this makes the logs more attractive but this is code for
emergency situations.

The idle task is responsible for certain power management activities.
How can you be sure the system is wedged because of bugs in that code?

It's a fair point, but as core count increases, you really run the risk
of losing the valuable data in a sea of data that isn't. For example,
for the architecture I maintain, we have the TILE-Gx72, which is a
72-core chip. If each core's register dump and backtrace is 40 lines,
we're up to around 3,000 lines of console output. Filtering that down by
a factor of 10x or more (if only a handful of cores happen to be active,
which is not uncommon) is a substantial usability improvement.

No objections to your use case. The output feels very verbose even with "only" eight cores.


That said, it's true that the original solution I offered (examining
just is_idle_task() plus interrupt nesting) is imprecise. It is
relatively straightforward to add a bit of per-cpu state that is set at
the same moment we currently do stop/start_critical_timings(), which
would indicate much more specifically that the cpu was running the
idling code itself, and not anything more complex. In that case if the
flag was set, you would know you were either sitting on a
processor-specific idle instruction in arch_cpu_idle(), or else polling
one or two memory locations in a tight loop in cpu_idle_poll(), which
presumably would offer sufficient precision to feel safe.

Here's an alternative version of the patch which incorporates this
idea. Do you think this is preferable? Thanks!

I prefer the approach taken by the new patch although I think the implementation might be buggy...


commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
Author: Chris Metcalf <cmetcalf@xxxxxxxxxx>
Date: Mon Feb 29 11:56:32 2016 -0500

nmi_backtrace: generate one-line reports for idle cpus
When doing an nmi backtrace of many cores, and most of them are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and
just emit one line, "NMI backtrace for N skipped: idle".
Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>

diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 786ad32631a6..b8c3c4cf88ad 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct
cpuidle_driver *drv,
/* kernel/sched/idle.c */
extern void sched_idle_set_state(struct cpuidle_state *idle_state);
extern void default_idle_call(void);
+extern bool in_cpu_idle(void);

#ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
atomic_t *a);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 544a7133cbd1..9aff315f278b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif

+static DEFINE_PER_CPU(bool, cpu_idling);
+
+/* Was the cpu was in the low-level idle code when interrupted? */
+bool in_cpu_idle(void)
+{
+ return this_cpu_read(cpu_idling);

I think we continue to need the code to identify a core that is running an interrupt handler. Interrupts are not masked at the point we set cpu_idling to false meaning we can easily be preempted before we clear the flag.


+}
+
static inline int cpu_idle_poll(void)
{
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
local_irq_enable();
stop_critical_timings();
+ this_cpu_write(cpu_idling, true);
while (!tif_need_resched() &&
(cpu_idle_force_poll || tick_check_broadcast_expired()))
cpu_relax();
+ this_cpu_write(cpu_idling, false);
start_critical_timings();
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
rcu_idle_exit();
@@ -89,7 +99,9 @@ void default_idle_call(void)
local_irq_enable();
} else {
stop_critical_timings();
+ this_cpu_write(cpu_idling, true);
arch_cpu_idle();
+ this_cpu_write(cpu_idling, false);
start_critical_timings();
}
}
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..75b5eacaa5d3 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpuidle.h>

#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)

/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (in_cpu_idle()) {
+ pr_warn("NMI backtrace for cpu %d skipped: idle\n",
+ cpu);
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));