Re: [PATCH] NMI watch dog notify patch

From: George Anzinger
Date: Mon Aug 01 2005 - 18:02:28 EST


Keith Owens wrote:
On Fri, 29 Jul 2005 13:55:23 -0700, George Anzinger <george@xxxxxxxxxx> wrote:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down. If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

void die_nmi (struct pt_regs *regs, const char *msg)
{
+ if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, + 0, 0, SIGINT) == NOTIFY_STOP)
+ return;
+
spin_lock(&nmi_print_lock);
/*
* We are in trouble anyway, lets at least try


Minor nitpick. die_nmi() already gets a message passed in to
distinguish between different types of nmi. Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.

Excellent idea!

Also your patch adds a trailing whitespace on the call to notify_die().

Fixed.

This should do it.
-
George Anzinger george@xxxxxxxxxx
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger <george@xxxxxxxxxx>
Type: Enhancement
Description:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down. If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that
the time out may be due to debugger activities such as single
stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger<george@xxxxxxxxxx>

nmi.c | 5 ++++-
traps.c | 4 ++++
2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===================================================================
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs *
*/
alert_counter[cpu]++;
if (alert_counter[cpu] == 5*nmi_hz)
+ /*
+ * die_nmi will return ONLY if NOTIFY_STOP happens..
+ */
die_nmi(regs, "NMI Watchdog detected LOCKUP");
- } else {
+
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
Index: linux-2.6.13-rc/arch/i386/kernel/traps.c
===================================================================
--- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c
+++ linux-2.6.13-rc/arch/i386/kernel/traps.c
@@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock);

void die_nmi (struct pt_regs *regs, const char *msg)
{
+ if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 0, SIGINT) ==
+ NOTIFY_STOP)
+ return;
+
spin_lock(&nmi_print_lock);
/*
* We are in trouble anyway, lets at least try