Re: (OT) md5sum proving to be an EXCELLENT memory test

From: Stephen Satchell (list@fluent2.pyramid.net)
Date: Mon Apr 21 2003 - 08:46:30 EST


At 03:04 PM 4/21/03 +0200, you wrote:
>Stephen Satchell wrote:
> >
> > The "good memory test suite" I have didn't catch it. The copy method you
> > suggest didn't catch it. The BIOS memory check didn't catch it. Only the
> > linux compile method -- mentioned on this list -- did catch it. And so did
> > using md5sum on very long files.
>
> >From my experience it's not bad RAM that shows these failures but
>an overheating CPU.
>
>Ciao, ET.

Well, I just finished an overnight test on the server in question. If it's
an overheating CPU, then I don't understand how replacing the RAM stick
fixes the problem so that it runs without failure for 12 hours. An
overheating RAM stick could do it, too.

In tracing through my collection of computers, I have found others with
this flaky RAM that show intermittent failures using md5sum that other
tests don't find (including kernel compiles). All the failing RAM comes
from the same manufacturer and batch. I think it's just barely marginal
memory, myself. Once I replace it all, I should stop seeing the
every-other-fortnight problems that have plagued me for the past year.

It also means I need to check the junk pile to see if the failed systems
have RAM trouble at their heart.

(N.B.: This little exercise has opened my eyes some little bit. Easter
was a great time to bring down the servers one by one, blow the crap out of
them with compressed air, and check the RAM. And replace questionable
RAM. I'm also going to have a little talk with the computer store that
sold me a "white box" computer with PC66 SDRAM when PC100 SDRAM was called
for.)

Here's how bad some of this stuff was: I took an ASUS P5A-B board with an
AMD K6-II, slowed the bus down to 87 MHz from 100 MHz, and the damn memory
fails at the slower bus speed. Every loose stick from that batch of RAM
shows the failure. (memtest86 caught the errors, by the way, so
diagnostics aren't completely worthless.)

Kingston, Viking, and Micron are getting more of my business.

--
X -> unknown; Spurt -> drip of water under pressure
Expert -> X-Spurt -> Unknown drip under pressure.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Apr 23 2003 - 22:00:29 EST