Re: Dazed and Confused

From: Alan Cox (alan@lxorguk.ukuu.org.uk)
Date: Fri Dec 06 2002 - 11:05:19 EST


On Fri, 2002-12-06 at 14:55, Greg Boyce wrote:
> I work in a company with a large number of Linux machine deployed all
> around the country, and in some of the machines we've been seeing the
> following error:
>
> Uhhuh. NMI received. Dazed and confused, but trying to continue
> You probably have a hardware problem with your RAM chips

There are several causes of an NMI depending on the system - hardware
failures is one, some systems do it for things like PCI errors, a few
boxes you see them on power management events (notably old 486's)

> Due to the number of machines and their locations, running memtest86 on
> them isn't exactly feasible.

Then buy better ram ;)

> Is there anything besides failing hardware that could be the cause of this
> error? Also, how serious is this error? Some of the machines reporting
> this error have had problems with programs crashing, while others seem to
> run fine.

Take a sample set of machines which have been crashing and run memtest86
on a couple. That should tell you if it is RAM. From a sample you can
then figure out how to handle the rest (things that come to mind if
memtest86 fails on the test machines include replacing the ram in a few
more then taking the old ram back to test)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Dec 07 2002 - 22:00:26 EST