Re: [PATCH] Allow users to force a panic on NMI

From: Pavel Machek
Date: Wed May 24 2006 - 10:05:08 EST


Hi!

> To quote Alan Cox:
>
> The default Linux behaviour on an NMI of either memory or unknown is to
> continue operation. For many environments such as scientific computing
> it is preferable that the box is taken out and the error dealt with than
> an uncorrected parity/ECC error get propogated.

> This is just a refreshed post of Alan's original patch
> <http://www.ussg.iu.edu/hypermail/linux/kernel/0510.2/1208.html>, with
> hopes this time it sticks. :)
>
> It applies cleanly on top of my other nmi patches.

> Index: linux-don/arch/i386/kernel/traps.c
> ===================================================================
> --- linux-don.orig/arch/i386/kernel/traps.c
> +++ linux-don/arch/i386/kernel/traps.c
> @@ -602,6 +602,8 @@ static void mem_parity_error(unsigned ch
> "to continue\n");
> printk(KERN_EMERG "You probably have a hardware problem with your RAM "
> "chips\n");
> + if (panic_on_unrecovered_nmi)
> + panic("NMI: Not continuing");
>
> /* Clear and disable the memory parity error line. */
> clear_mem_error(reason);
> @@ -637,6 +639,10 @@ static void unknown_nmi_error(unsigned c
> reason, smp_processor_id());
> printk("Dazed and confused, but trying to continue\n");

'Trying to contninue'

> printk("Do you have a strange power saving mode enabled?\n");
> +
> + if (panic_on_unrecovered_nmi)
> + panic("NMI: Not continuing");
> +

'not really'. Move printks around so it makes sense...

> Index: linux-don/arch/x86_64/kernel/traps.c
> ===================================================================
> --- linux-don.orig/arch/x86_64/kernel/traps.c
> +++ linux-don/arch/x86_64/kernel/traps.c
> @@ -608,6 +608,8 @@ mem_parity_error(unsigned char reason, s
> {
> printk("Uhhuh. NMI received. Dazed and confused, but trying to continue\n");
> printk("You probably have a hardware problem with your RAM chips\n");
> + if (panic_on_unrecovered_nmi)
> + panic("NMI: Not continuing");
>
> /* Clear and disable the memory parity error line. */

same here.

> @@ -633,6 +635,10 @@ unknown_nmi_error(unsigned char reason,
> { printk("Uhhuh. NMI received for unknown reason %02x.\n", reason);
> printk("Dazed and confused, but trying to continue\n");
> printk("Do you have a strange power saving mode enabled?\n");
> +
> + if (panic_on_unrecovered_nmi)
> + panic("NMI: Not continuing");
> +
> }
>

and here.

--
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/