Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers

From: Don Zickus
Date: Wed May 07 2014 - 21:28:46 EST


On Wed, May 07, 2014 at 07:50:48PM +0000, Elliott, Robert (Server Storage) wrote:
> Don Zickus <dzickus@xxxxxxxxxx> wrote:
> > The main reason for this patch is because I have a hard time knowing
> > what NMI handlers are registered on the system when debugging NMI issues.
> >
> > This info is provided in /proc/interrupts for interrupt handlers, so I
> > added support for NMI stuff too. As a bonus it provides stat breakdowns
> > much like the interrupts.
>
> /proc/interrupts only shows online CPUs, while /proc/softirqs shows
> all possible CPUs. Is there any value in this information for all
> possible CPUs? Perhaps a /proc/hardirqs could be created alongside.

Well if they are not online, they probably won't be generating NMIs, so I
am not sure there is much value there.

>
> > The only ugly issue is how to label NMI subtypes using only 3 letters
> > and still make it obvious it is part of the NMI. Adding a /proc/nmi
> > seemed overkill, so I choose to indent things by one space.
>
> The list only shows the currently registered handlers, which may
> differ from the ones that were registered when the NMIs whose counts
> are being displayed occurred. You might want to describe these new
> rows and mention that in Documentation/filesystems/proc.txt and
> the proc(5) manpage.

Ok, but that is a /proc/interrupts problem not one specific to NMI, no?

>
> > Sample output is below:
> >
> > [root@dhcp71-248 ~]# cat /proc/interrupts
> > CPU0 CPU1 CPU2 CPU3
> > 0: 29 0 0 0 IR-IO-APIC-edge timer
> > <snip>
> > NMI: 20 774 10986 4227 Non-maskable interrupts
> > LOC: 21 775 10987 4228 Local PMI, arch_bt
> > EXT: 0 0 0 0 External plat
> > UNK: 0 0 0 0 Unknown
> > SWA: 0 0 0 0 Swallowed
>
> Adding the list of NMI handlers in /proc/interrupts is a bit
> inconsistent with the other interrupts, which don't describe their
> handlers. It would be helpful to distinguish between a handler
> list being present, being present but empty, or not being present.
>
> Maybe use parenthesis like this (using Ingo's suggested format):
> NMI: 20 774 10986 4227 Non-maskable interrupts
> NLC: 21 775 10987 4228 NMI: Local (PMI, arch_bt)
> NXT: 0 0 0 0 NMI: External (plat)
> NUN: 0 0 0 0 NMI: Unknown ()
> NSW: 0 0 0 0 NMI: Swallowed
> LOC: 30374 24749 20795 15095 Local timer interrupts
>

Hmm, looking at /proc/interrupts I see

1: 858014 29054 23191 9337 IO-APIC-edge i8042
8: 3 24 10 2 IO-APIC-edge rtc0
9: 387555 9219 8308 7944 IO-APIC-fasteoi acpi
12: 9251360 163811 158846 141916 IO-APIC-edge i8042
16: 0 0 0 0 IO-APIC-fasteoi mmc0
17: 14 5 7 10 IO-APIC-fasteoi
19: 6892 367 13 10 IO-APIC-fasteoi ehci_hcd:usb2, ips, firewire_ohci
23: 1363281 753 94 94 IO-APIC-fasteoi ehci_hcd:usb1

Those may not be specific handlers, but they are registered irq names, no?
That basically matches what I was trying to accomplish with NMI.

I guess I don't see how what I did is much different than what already
exists.


> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> > index d99f31d..520359c 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> ...
> > +void nmi_show_interrupts(struct seq_file *p, int prec)
> > +{
> > + int j;
> > + int indent = prec + 1;
> > +
> > +#define get_nmi_stats(j) (&per_cpu(nmi_stats, j))
> > +
> > + seq_printf(p, "%*s: ", indent, "LOC");
> > + for_each_online_cpu(j)
> > + seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
> > + seq_printf(p, " %-8s", "Local");
> > +
> > + print_nmi_action_name(p, NMI_LOCAL);
> > +
> > + seq_printf(p, "%*s: ", indent, "EXT");
> > + for_each_online_cpu(j)
> > + seq_printf(p, "%10u ", get_nmi_stats(j)->external);
> > + seq_printf(p, " %-8s", "External");
> > +
> > + print_nmi_action_name(p, NMI_EXT);
> > +
> > + seq_printf(p, "%*s: ", indent, "UNK");
> > + for_each_online_cpu(j)
> > + seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
> > + seq_printf(p, " %-8s", "Unknown");
> > +
> > + print_nmi_action_name(p, NMI_UNKNOWN);
> > +
>
> The NMI handler types are in arch/c86/include/asm/nmi.h:
> enum {
> NMI_LOCAL=0,
> NMI_UNKNOWN,
> NMI_SERR,
> NMI_IO_CHECK,
> NMI_MAX
> };
>
> The new code only prints the registered handlers for NMI_LOCAL,
> NMI_UNKNOWN, and the new NMI_EXT. Consider adding counters
> for NMI_SERR and NMI_IO_CHECK and printing their handlers too.
>
> drivers/watchdog/hpwdt.c is the only code currently in
> the kernel registering handlers for them.

Yeah, I guess I was trying to remove NMI_SERR and NMI_IO_CHECK. I forgot
if I accomplished that with this patch set or not. Instead I had hpwdt do
the ioport read directly instead of having do_default_nmi do it. I can
look at it again.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/