RE: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers

From: Elliott, Robert (Server Storage)
Date: Wed May 07 2014 - 15:52:15 EST


Don Zickus <dzickus@xxxxxxxxxx> wrote:
> The main reason for this patch is because I have a hard time knowing
> what NMI handlers are registered on the system when debugging NMI issues.
>
> This info is provided in /proc/interrupts for interrupt handlers, so I
> added support for NMI stuff too. As a bonus it provides stat breakdowns
> much like the interrupts.

/proc/interrupts only shows online CPUs, while /proc/softirqs shows
all possible CPUs. Is there any value in this information for all
possible CPUs? Perhaps a /proc/hardirqs could be created alongside.

> The only ugly issue is how to label NMI subtypes using only 3 letters
> and still make it obvious it is part of the NMI. Adding a /proc/nmi
> seemed overkill, so I choose to indent things by one space.

The list only shows the currently registered handlers, which may
differ from the ones that were registered when the NMIs whose counts
are being displayed occurred. You might want to describe these new
rows and mention that in Documentation/filesystems/proc.txt and
the proc(5) manpage.

> Sample output is below:
>
> [root@dhcp71-248 ~]# cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 0: 29 0 0 0 IR-IO-APIC-edge timer
> <snip>
> NMI: 20 774 10986 4227 Non-maskable interrupts
> LOC: 21 775 10987 4228 Local PMI, arch_bt
> EXT: 0 0 0 0 External plat
> UNK: 0 0 0 0 Unknown
> SWA: 0 0 0 0 Swallowed

Adding the list of NMI handlers in /proc/interrupts is a bit
inconsistent with the other interrupts, which don't describe their
handlers. It would be helpful to distinguish between a handler
list being present, being present but empty, or not being present.

Maybe use parenthesis like this (using Ingo's suggested format):
NMI: 20 774 10986 4227 Non-maskable interrupts
NLC: 21 775 10987 4228 NMI: Local (PMI, arch_bt)
NXT: 0 0 0 0 NMI: External (plat)
NUN: 0 0 0 0 NMI: Unknown ()
NSW: 0 0 0 0 NMI: Swallowed
LOC: 30374 24749 20795 15095 Local timer interrupts

> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index d99f31d..520359c 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
...
> +void nmi_show_interrupts(struct seq_file *p, int prec)
> +{
> + int j;
> + int indent = prec + 1;
> +
> +#define get_nmi_stats(j) (&per_cpu(nmi_stats, j))
> +
> + seq_printf(p, "%*s: ", indent, "LOC");
> + for_each_online_cpu(j)
> + seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
> + seq_printf(p, " %-8s", "Local");
> +
> + print_nmi_action_name(p, NMI_LOCAL);
> +
> + seq_printf(p, "%*s: ", indent, "EXT");
> + for_each_online_cpu(j)
> + seq_printf(p, "%10u ", get_nmi_stats(j)->external);
> + seq_printf(p, " %-8s", "External");
> +
> + print_nmi_action_name(p, NMI_EXT);
> +
> + seq_printf(p, "%*s: ", indent, "UNK");
> + for_each_online_cpu(j)
> + seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
> + seq_printf(p, " %-8s", "Unknown");
> +
> + print_nmi_action_name(p, NMI_UNKNOWN);
> +

The NMI handler types are in arch/c86/include/asm/nmi.h:
enum {
NMI_LOCAL=0,
NMI_UNKNOWN,
NMI_SERR,
NMI_IO_CHECK,
NMI_MAX
};

The new code only prints the registered handlers for NMI_LOCAL,
NMI_UNKNOWN, and the new NMI_EXT. Consider adding counters
for NMI_SERR and NMI_IO_CHECK and printing their handlers too.

drivers/watchdog/hpwdt.c is the only code currently in
the kernel registering handlers for them.

---
Rob Elliott HP Server Storage



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/