Re: memory testing [was: Re: General protection fault in kswapd in 2.0.18]

Charlie Wilkinson (cwilkins@boink.clark.net)
Sat, 14 Sep 1996 20:54:45 -0400 (EDT)


I believe it was Edward Welbon who once said:
>
> Arachnid_Home_Page: http://www.bga.com/~welbon/spider
> Arachnid_Mail_List: arachnid@bga.com
> Subliminal-Message: Run linux now
> Mime-Version: 1.0
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> Sender: owner-linux-kernel@vger.rutgers.edu
> Precedence: bulk
> content-length: 780
>
>
>
> On Sat, 14 Sep 1996, Ingo Molnar wrote:
>
> > On Fri, 13 Sep 1996, Linus Torvalds wrote:
> >
> > > Does anybody on the kernel list know of a good test program that is generally
> > > available that can be left running over-night or similar?
>
> > Pick up a hairdrier and point it at your memory modules, while the system
> > is running. If some of the modules are flaky, then errors will show up
> > quite fast.
>
> But be careful heat is vary bad for semiconductors, especially the densest
> ones. You can use Norton or AMI diagnostics to do memory tests. I have
> not been able to run these under dosemu these programs are evidently not
> very operating system friendly. I recommend that you use the selections
> that write various problems and if possible without ECC.

Using heat for troubleshooting is a very old and time-tested trick, but
Edward is quite correct in warning not to overdo it. A rule of thumb is
that if you cannot keep your finger on a semiconductor component for at
least five to ten seconds, then it is too hot. You might have to build
something out of cardboard or whatever to keep the heat away from the
components you aren't testing. It's also good to remind yourself that
heat rises, so you don't fry your power supply or something. (Assuming
a tower case, turn it on its side.)

Another handy trick is to spray freon on a suspect component and chill
it down. Sometimes this will thermal-shock a borderline component into
failure mode. Freon itself is virtually outlawed in the states, but I'm
sure there must be some substitute available, probably at twice the
price and almost as harmful to the ozone. (What, me cynical?) Check
with your local electronics supply house or audio-video repair shop.

The two methods, hot and cold, aren't interchangable. One may work, the
other may not. It depends on what is failing and how. Of the two, I
think the hot method is more likely to get results, but it's harder to
control. Often times, I inadvertently heated up so many components, I
couldn't be sure which had actually crapped out without getting out the
test gear. But if you encounter a suspect SIMM, just swap it to another
bank and re-test. If the errors move with the SIMM, then you have
nabbed your troublemaker.

Regarding memory testing: Two of the programs I've used for this, QMT
(Qualitas Memory Tester - comes with QEMM386) and Check-it, both require
absolute plain-vanilla DOS mode to do a full and reliable memory test.
ie - boot into ordinary MSDOS without HIMEM, EMM386, any other type of
memory manager, caching software, or anything else of the kind. That
being the case, it should be pretty obvious that booting into Linux and
dosemu wouldn't quite cut it. :-/ Both QMT and Check-it have
continuous modes for overnight testing. In fact, QMT has some test
modes that are so thorough that even one iteration can take 24 hours or
more.

I hope we've given you some useful info. Good luck tracking down those
pesky gremlins! :-)

-cw

-- 
=============================================================================
   Charlie Wilkinson      Maintainer - Radio For Peace International Web Site
cwilkins@boink.clark.net         http://www.clark.net/pub/cwilkins/rfpi
=============================================================================
QOTD:
In Blythe, California, a city ordinance declares that a person must own
at least two cows before he can wear cowboy boots in public.