[PATCH] x86/kernel: Add generic handler for NMI events

From: Adrien Mahieux
Date: Sat May 27 2017 - 11:03:48 EST


The kernel allows drivers to register NMI handlers, or to use sysctls like
unknown_nmi_panic to control the behavior of the system when an NMI is
received by the kernel.

This patch adds a generic handler where sysadmins can specify the behaviour
to adopt for each NMI event code. List of events is provided at module load
or on kernel cmdline, so we can also generate kdump upon boot error.
We can specify events to:
- silently ignore: let them go through other handler, and don't log them.
- drop, stop their propagation to other handlers and log it.
- panic, so the kernel will panic (and generate a kdump if set).

A common use-case in servers is to make the kernel panic to get a coredump
by kdump/kexec facility. With the drop parameter, it avoids a running-kdump
to be stopped by receiving another NMI, thus making the vmcore unusable.

Manufacturers don't provide kernel for this and even the HP's hpwdt driver
panics on any NMI.
This is an issue when we have other tools or hardware that generates NMI
for non critical events (failing Power supply, FPGA, watchdog...).


Against the warnings of checkpatch.pl, I've left the kernel-version macro
as it would be easier to be integrated in current (and old) kernels of
production servers.
If you think it's the distrib maintainer's job to backport features, please
let me know and I'll replace this by a link to a repo with it.

Signed-off-by: Adrien Mahieux <adrien.mahieux@xxxxxxxxx>
---
Documentation/nmimgr.txt | 134 +++++++++++++++++
arch/Kconfig | 18 +++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/nmimgr.c | 374 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 527 insertions(+)
create mode 100644 Documentation/nmimgr.txt
create mode 100644 arch/x86/kernel/nmimgr.c

diff --git a/Documentation/nmimgr.txt b/Documentation/nmimgr.txt
new file mode 100644
index 000000000000..dbda626387c3
--- /dev/null
+++ b/Documentation/nmimgr.txt
@@ -0,0 +1,134 @@
+NMI Generic Handler
+===================
+
+This module allows you to Panic or Ignore specific NMI events.
+
+
+Author: Adrien Mahieux <adrien.mahieux@xxxxxxxxx>
+Tools: https://github.com/saruspete/kdumptools
+
+
+Description
+-----------
+
+Manage NMI events in a more fine-grained manner than unknown_nmi_panic sysctl.
+
+
+When a host is unresponsive, we'd like to take a vmcore for offline analysis.
+If kdump is correctly setup, we need to crash/panic the system for it to start.
+
+This is usually done by sending an NMI to the system (as no userland process
+is responding anymore) through the BMC.
+
+If no handler registers the vendor-specific NMI event to trigger a crash,
+the kernel logs a "Dazed and confused, but trying to continue" message and
+server is still unresponsive.
+
+
+Why not just using "nmi_panic" sysctls ?
+----------------------------------------
+
+There is 3 sysctls that allows administrators to generate a panic:
+- panic_on_io_nmi
+- panic_on_unrecovered_nmi
+- unknown_nmi_panic
+
+These sysctl are overkill as multiple NMI can be generated for non-critical
+events like:
+- Software debugging, like perf on Pentium processors
+- External cards like FPGA to communicate
+- Motherboard alerts of a dying Power-Supply
+
+However, the NMI event changes between
+vendors and/or revisions. Also, you may get NMI for non-critical events, like
+a redundent power-supply unit dead, or a FPGA card being stuck.
+
+If you are fine with the current unknown_nmi_panic settings, this module can
+also be used to ignore other NMIs during the dump process, even those who have
+a kernel module for handling. This avoid the interruption of the dump process,
+thus having a non-usable vmcore.
+
+
+Parameters
+----------
+
+Available events are (in processing order):
+
+- events_ignore=LIST Events to ignore silently, let them pass to other handlers
+- events_drop=LISt Events to drop, so no other handler can process them
+- events_panic=LIST Events to make the kernel Panic
+
+LIST is standard kernel lists, can be composed of
+- simple lists: 0,13,16,44,10
+- ranges: 10-100
+- Mix of both: 0,1,2-8,10
+
+
+Like all modules, you can also configure it from boot cmd by prefixing the
+module name and a dot:
+ nmimgr.events_ignore=... nmimgr.events_drop=... nmimgr.events_panic=...
+
+
+
+
+Discover your hardware NMI events
+---------------------------------
+
+When trying new hardware, you'll have to discover the NMI events assigned
+to the external generator.
+
+To ease this task, you can use the setup.sh script located at:
+ https://raw.githubusercontent.com/Saruspete/nmimgr/master/setup.sh
+
+
+If you want to do it manually, or if your hardware is not referenced yet:
+
+- ensure all nmi sysctls are disabled (sysctl -a|grep _nmi)
+- load the module without parameter
+- generate an NMI from your BMC
+- check dmesg for lines containing "nmimgr:", specifically "Handling new NMI".
+- The code you are interested in (the event) is the decimal value between ( )
+
+If you see this log: "Handling new NMI type:1 event:0x10 (16)"
+Then the event code generated is 16.
+To make the system panic:
+ # modprobe nmimgr events_panic=16
+To ignore it and disable messages:
+ # modprobe nmimgr events_ignore=16
+
+
+Configure it permanently
+------------------------
+
+If you want to be able to reconfigure it, and if built as a module:
+ # echo "options nmimgr events_panic=16 events_drop=10" > \
+ /etc/modprobe.d/nmimgr.conf
+
+If it's built-in, change the cmdline boot, adding parameters
+
+2) Reload it:
+ # rmmod nmimgr; modprobe nmimgr
+3) Check the parameters are correctly set (you should see your value)
+ # cat /sys/module/nmimgr/parameters/events_panic
+
+
+
+Generate an NMI
+---------------
+
+- ipmitool chassis power diag
+- vboxmanage debugvm "VMName" injectnmi
+- virsh inject-nmi "VMName"
+
+
+Kernel Revision history
+-----------------------
+
+2.6.32: Using notifier_block structs
+3.2 : Moved NMI descriptions to an enum: LOCAL, UNKNOWN, MAX
+ https://lwn.net/Articles/461215/
+ https://lkml.org/lkml/2012/3/8/386
+3.5 : Moved "register_nmi_handler" to a macro + static struct nmiaction fn##_na
+ This broke the loop logic used between 3.2 and 3.5
+
+
diff --git a/arch/Kconfig b/arch/Kconfig
index 6c00e5b00f8b..339a5eeb0591 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -12,6 +12,24 @@ config KEXEC_CORE
config HAVE_IMA_KEXEC
bool

+config NMI_MGR
+ tristate "NMI Generic Handler"
+ depends on HAVE_NMI
+ depends on X86
+ help
+ NMI Generic handler allows admins to specify NMI Events to
+ drop, ignore or make the system panic.
+ This is needed for servers with BMC (HP ILO, Dell IDRAC,
+ AMI ASMB...) to specify events instead of killall sysctl
+ "unknown_nmi_panic".
+
+ It can be used to avoid kdump being interrupted by double NMI
+
+ See <file:Documentation/nmimgr.txt>
+
+ If unsure, say N.
+
+
config OPROFILE
tristate "OProfile system profiling"
depends on PROFILING
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4b994232cb57..a37fdfe31849 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -42,6 +42,7 @@ obj-y := process_$(BITS).o signal.o
obj-$(CONFIG_COMPAT) += signal_compat.o
obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
obj-y += time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_NMI_MGR) += nmimgr.o
obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o
obj-y += setup.o x86_init.o i8259.o irqinit.o jump_label.o
obj-$(CONFIG_IRQ_WORK) += irq_work.o
diff --git a/arch/x86/kernel/nmimgr.c b/arch/x86/kernel/nmimgr.c
new file mode 100644
index 000000000000..b36168043615
--- /dev/null
+++ b/arch/x86/kernel/nmimgr.c
@@ -0,0 +1,374 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details
+ *
+ */
+
+/***************************************************************************
+ * NMI Generic Handler
+ *
+ * Manage NMI events in a more fine-grained manner than "unknown_nmi_panic".
+ * Allows you to Panic or Ignore specific NMI events.
+ *
+ * Written by: Adrien Mahieux
+ * See also: https://fr.slideshare.net/Saruspete/kernel-crashdump-53496836
+ * :> https://github.com/saruspete/kdumptools
+ */
+
+/*
+ * Kernel Revision history:
+ * 2.6.32: Using notifier_block structs
+ * 3.2 : Moved NMI descriptions to an enum: LOCAL, UNKNOWN, MAX
+ * https://lwn.net/Articles/461215/
+ * https://lkml.org/lkml/2012/3/8/386
+ * 3.5 : Moved register_nmi_handler to macro+static struct nmiaction fn##_na
+ */
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/version.h>
+#include <linux/nmi.h>
+
+#include <asm/nmi.h>
+
+/* Compatibility management */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 2, 0)
+#include <linux/notifier.h>
+#include <linux/kdebug.h> /* For NMI_DIE and NMI_DIE_IPI */
+
+/* HANDLED=1, OK=1, DONE=0 */
+#define NMI_HANDLED NOTIFY_OK
+#define NMI_DONE NOTIFY_DONE
+#endif
+
+
+#define NMIMGR_VERSION "0.4"
+#define NMIMGR_NAME "nmimgr"
+#define NMIMGR_NBMAX 256
+
+static int events_panic_list[NMIMGR_NBMAX];
+static int events_drop_list[NMIMGR_NBMAX];
+static int events_ignore_list[NMIMGR_NBMAX];
+static char *events_panic;
+static char *events_drop;
+static char *events_ignore;
+
+
+
+/**
+ * Handler
+ */
+static int __nmimgr_handle(unsigned int type, unsigned char reason,
+ struct pt_regs *regs)
+{
+
+ int i;
+
+ /* Check for ignored NMI */
+ for (i = 1; i < NMIMGR_NBMAX; i++) {
+ if (reason == events_ignore_list[i])
+ return NMI_DONE;
+ }
+
+
+ pr_notice(NMIMGR_NAME": Handling new NMI type:%u event:0x%02x (%d)\n",
+ type, reason, reason);
+
+ /* Check for dropped NMI */
+ for (i = 1; i < NMIMGR_NBMAX; i++) {
+
+ if (reason == events_drop_list[i]) {
+ pr_notice(NMIMGR_NAME": Drop NMI event:0x%02x (%d)\n",
+ reason, reason);
+ return NMI_HANDLED;
+ }
+ }
+
+ /* Check for Panic NMI */
+ for (i = 1; i < NMIMGR_NBMAX; i++) {
+
+ if (reason == events_panic_list[i]) {
+ pr_emerg(NMIMGR_NAME": Panic on Event:0x%02x(%d)\n",
+ reason, reason);
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0)
+ nmi_panic(regs, NMIMGR_NAME": Hit explicit panic");
+#else
+ panic(NMIMGR_NAME": Hit explicit panic");
+#endif
+
+ }
+ }
+
+ /* Still there: unmanaged NMI Code. Send to other handlers */
+ pr_notice(NMIMGR_NAME": Unmanaged NMI event:0x%02x (%d), let it pass\n",
+ reason, reason);
+
+ return NMI_DONE;
+}
+
+
+
+
+/***** Kernel < 3.2 **********************************************************/
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 2, 0)
+
+static int nmimgr_handle(struct notifier_block *nb, unsigned long val,
+ void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ unsigned char reason = args->err;
+
+ /* Only process NMI cases */
+ switch (val) {
+ case DIE_NMI:
+ case DIE_NMIWATCHDOG:
+ case DIE_NMI_IPI:
+ case DIE_NMIUNKNOWN:
+ return __nmimgr_handle(1, reason, args->regs);
+
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+
+static struct notifier_block nmimgr_notifier = {
+ .notifier_call = nmimgr_handle,
+ .priority = 0x7FFFFFFF
+};
+
+
+/**
+ * Handler registration
+ */
+static int nmimgr_register(void)
+{
+ int ret = register_die_notifier(&nmimgr_notifier);
+
+ if (ret) {
+ pr_warn(NMIMGR_NAME": Unable to register NMI handler\n");
+ return ret;
+ }
+
+ pr_notice(NMIMGR_NAME": Registered handler\n");
+
+ return 0;
+}
+
+/**
+ * Handler unregistration
+ */
+static void nmimgr_unregister(void)
+{
+ unregister_die_notifier(&nmimgr_notifier);
+}
+
+/***** Kernel 3.2+ ***********************************************************/
+#else
+
+static int nmimgr_handle(unsigned int type, struct pt_regs *regs)
+{
+ return __nmimgr_handle(type, x86_platform.get_nmi_reason(), regs);
+}
+
+
+
+/**
+ * handler registration
+ */
+static int nmimgr_register(void)
+{
+ int ret = 0;
+ int i;
+
+ /* A loop wont work because of Macro rewriting of
+ * register_nmi_handler (https://lkml.org/lkml/2012/3/8/386)
+ */
+
+ /* We register our handler first, as we only manage a specific list */
+ ret = register_nmi_handler(
+ NMI_UNKNOWN, nmimgr_handle, NMI_FLAG_FIRST, NMIMGR_NAME);
+ if (ret) {
+ pr_warn(NMIMGR_NAME ": Unable to register NMI_UNKNOWN\n");
+ i = NMI_UNKNOWN-1;
+ goto err;
+ }
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 5, 0)
+ ret = register_nmi_handler(
+ NMI_SERR, nmimgr_handle, NMI_FLAG_FIRST, NMIMGR_NAME);
+ if (ret) {
+ pr_warn(NMIMGR_NAME ": Unable to register NMI_SERR\n");
+ i = NMI_SERR-1;
+ goto err;
+ }
+ ret = register_nmi_handler(
+ NMI_IO_CHECK, nmimgr_handle, NMI_FLAG_FIRST, NMIMGR_NAME);
+ if (ret) {
+ pr_warn(NMIMGR_NAME ": Unable to register NMI_IO_CHECK\n");
+ i = NMI_IO_CHECK-1;
+ goto err;
+ }
+#endif
+
+
+ return 0;
+
+err:
+ for (; i > 0; i--)
+ unregister_nmi_handler(i, NMIMGR_NAME);
+
+ return ret;
+}
+
+/**
+ * Handler unregistration
+ */
+static void nmimgr_unregister(void)
+{
+ int i;
+
+ for (i = NMI_MAX-1; i > NMI_LOCAL; i--)
+ unregister_nmi_handler(i, NMIMGR_NAME);
+
+}
+
+#endif
+
+/*****************************************************************************/
+
+
+
+/**
+ * Parse the input string
+ */
+static int __init nmimgr_setup_panic(char *str)
+{
+ char *ret = 0;
+
+ if (!str)
+ return 1;
+
+ pr_info(NMIMGR_NAME ": panic events: %s\n", str);
+
+ /* lib/cmdline.c: Extract int list from str into events_panic_list[] */
+ ret = get_options(str, ARRAY_SIZE(events_panic_list),
+ events_panic_list);
+ if (ret && *ret != 0) {
+ pr_err(NMIMGR_NAME": Invalid events_panic, ret:%s\n", ret);
+ return 0;
+ }
+ return 1;
+}
+__setup("nmimgr.events_panic=", nmimgr_setup_panic);
+
+
+/**
+ *
+ */
+static int __init nmimgr_setup_ignore(char *str)
+{
+ char *ret = 0;
+
+ if (!str)
+ return 1;
+
+ pr_info(NMIMGR_NAME": ignore events: %s\n", str);
+
+ ret = get_options(str, ARRAY_SIZE(events_ignore_list),
+ events_ignore_list);
+ if (ret && *ret != 0) {
+ pr_err(NMIMGR_NAME": Invalid events_ignore, ret:%s\n", ret);
+ return 0;
+ }
+ return 1;
+}
+__setup("nmimgr.events_ignore=", nmimgr_setup_ignore);
+
+/**
+ *
+ */
+static int __init nmimgr_setup_drop(char *str)
+{
+ char *ret = 0;
+
+ if (!str)
+ return 1;
+
+ pr_info(NMIMGR_NAME": drop events: %s\n", str);
+
+ ret = get_options(str, ARRAY_SIZE(events_drop_list),
+ events_drop_list);
+ if (ret && *ret != 0) {
+ pr_err(NMIMGR_NAME": Invalid events_drop, ret:%s\n", ret);
+ return 0;
+ }
+ return 1;
+}
+__setup("nmimgr.events_drop=", nmimgr_setup_drop);
+
+
+
+/**
+ * Module initialization
+ */
+int __init init_module(void)
+{
+ int err;
+
+ pr_notice(NMIMGR_NAME ": Loaded module v%s\n", NMIMGR_VERSION);
+
+ nmimgr_setup_panic(events_panic);
+ nmimgr_setup_ignore(events_ignore);
+ nmimgr_setup_drop(events_drop);
+
+ err = nmimgr_register();
+ if (err) {
+ pr_warn(NMIMGR_NAME": NMI Management not available\n");
+ return err;
+ }
+ return 0;
+}
+/* module_init(init_module); */
+
+/**
+ * Module unloading
+ */
+void __exit clean_module(void)
+{
+ nmimgr_unregister();
+ pr_notice(NMIMGR_NAME": unloaded module\n");
+}
+
+module_exit(clean_module);
+
+
+
+
+MODULE_AUTHOR("Adrien Mahieux <adrien.mahieux@xxxxxxxxx");
+MODULE_DESCRIPTION("Remap specified NMI codes to generate a Panic\n"
+ "or drops specific events (self-test or while kdump'ing)\n"
+ "Also reads kernel parameter events_panic= upon loading");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(NMIMGR_VERSION);
+
+/* Parameters */
+module_param(events_panic, charp, 0444);
+MODULE_PARM_DESC(events_panic, "List of NMIs to panic upon receiving");
+
+module_param(events_ignore, charp, 0444);
+MODULE_PARM_DESC(events_ignore, "List of NMIs to ignore silently");
+
+module_param(events_drop, charp, 0444);
+MODULE_PARM_DESC(events_drop, "List of NMIs to hide from other handlers");
--
2.5.5