Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
From: Ingo Molnar
Date: Thu May 08 2014 - 02:04:26 EST
* Don Zickus <dzickus@xxxxxxxxxx> wrote:
> On Wed, May 07, 2014 at 07:50:48PM +0000, Elliott, Robert (Server Storage) wrote:
> > Don Zickus <dzickus@xxxxxxxxxx> wrote:
> > > The main reason for this patch is because I have a hard time knowing
> > > what NMI handlers are registered on the system when debugging NMI issues.
> > >
> > > This info is provided in /proc/interrupts for interrupt handlers, so I
> > > added support for NMI stuff too. As a bonus it provides stat breakdowns
> > > much like the interrupts.
> >
> > /proc/interrupts only shows online CPUs, while /proc/softirqs shows
> > all possible CPUs. Is there any value in this information for all
> > possible CPUs? Perhaps a /proc/hardirqs could be created alongside.
>
> Well if they are not online, they probably won't be generating NMIs, so I
> am not sure there is much value there.
>
> >
> > > The only ugly issue is how to label NMI subtypes using only 3 letters
> > > and still make it obvious it is part of the NMI. Adding a /proc/nmi
> > > seemed overkill, so I choose to indent things by one space.
> >
> > The list only shows the currently registered handlers, which may
> > differ from the ones that were registered when the NMIs whose counts
> > are being displayed occurred. You might want to describe these new
> > rows and mention that in Documentation/filesystems/proc.txt and
> > the proc(5) manpage.
>
> Ok, but that is a /proc/interrupts problem not one specific to NMI, no?
>
> >
> > > Sample output is below:
> > >
> > > [root@dhcp71-248 ~]# cat /proc/interrupts
> > > CPU0 CPU1 CPU2 CPU3
> > > 0: 29 0 0 0 IR-IO-APIC-edge timer
> > > <snip>
> > > NMI: 20 774 10986 4227 Non-maskable interrupts
> > > LOC: 21 775 10987 4228 Local PMI, arch_bt
> > > EXT: 0 0 0 0 External plat
> > > UNK: 0 0 0 0 Unknown
> > > SWA: 0 0 0 0 Swallowed
> >
> > Adding the list of NMI handlers in /proc/interrupts is a bit
> > inconsistent with the other interrupts, which don't describe their
> > handlers. It would be helpful to distinguish between a handler
> > list being present, being present but empty, or not being present.
> >
> > Maybe use parenthesis like this (using Ingo's suggested format):
> > NMI: 20 774 10986 4227 Non-maskable interrupts
> > NLC: 21 775 10987 4228 NMI: Local (PMI, arch_bt)
> > NXT: 0 0 0 0 NMI: External (plat)
> > NUN: 0 0 0 0 NMI: Unknown ()
> > NSW: 0 0 0 0 NMI: Swallowed
> > LOC: 30374 24749 20795 15095 Local timer interrupts
> >
>
> Hmm, looking at /proc/interrupts I see
>
> 1: 858014 29054 23191 9337 IO-APIC-edge i8042
> 8: 3 24 10 2 IO-APIC-edge rtc0
> 9: 387555 9219 8308 7944 IO-APIC-fasteoi acpi
> 12: 9251360 163811 158846 141916 IO-APIC-edge i8042
> 16: 0 0 0 0 IO-APIC-fasteoi mmc0
> 17: 14 5 7 10 IO-APIC-fasteoi
> 19: 6892 367 13 10 IO-APIC-fasteoi ehci_hcd:usb2, ips, firewire_ohci
> 23: 1363281 753 94 94 IO-APIC-fasteoi ehci_hcd:usb1
>
> Those may not be specific handlers, but they are registered irq
> names, no? That basically matches what I was trying to accomplish
> with NMI.
>
> I guess I don't see how what I did is much different than what
> already exists.
The parentheses makes the output more readable, especially as with the
NMI format it's not quite clear what is 'irq type' and what is
'handler'.
Might make sense to add parentheses for regular irq handlers as well,
for consistency and readability.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/