Re: random crashes, possible hardware flakiness

From: Warren Young (tangent@cyberport.com)
Date: Tue Jul 04 2000 - 15:55:12 EST


Sean Harding wrote:
>
> - bunzip on the new kernel failed once (claimed the file was corrupted) but
> worked the second time I tried it on the exact same file.
> - A few "internal compiler error" crashes when compiling the kernel.

The stack traces are meaningless to me, but these two errors indicate
memory corruption. It can either be bad memory modules, a bad CPU, or
something in between, like the motherboard chipset, which mediates the
transfer of data from memory to the CPU.

You say setiathome seemed to have something to do with it, and in fact
it may: if your motherboard has temperature monitoring on it, I'll bet
you see your CPU's temp rise 5-10 degrees C over idle when you run
high-load processes like that. If the CPU's at fault, raising its
temperature may push it over the edge from "functional" to
"error-inducing".

The only cheap thing you can do to test your system is to get a DOS
memory tester and run it with the "hurt me bad" setting. I use
#1TuffTest for that -- a $20-30 shareware thingy. Yes, it's as cheap as
it sounds, in a lot of senses. Its memory tester works fine, though.
:)

One other possibility is to take out some DIMMs. You say you have 192
MB, so you've probably got three 64 MB DIMMs. You can limp by on 1 at a
time for the next three weeks. Run the system hard, and see which if
any of them are present when you get crashes. If you don't get a crash
within a week, take out that DIMM and try the next one. If crashes
occur no matter the memory configuration, then it probably isn't the
DIMMs themselves.

-- 
= Warren -- ICBM Address: 36.8274040 N, 108.0204086 W, alt. 1714m

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jul 07 2000 - 21:00:15 EST