Memory corruption, errors, and crashes...

Laszlo Vecsey (master@internexus.net)
Wed, 17 Apr 1996 21:12:40 -0400 (EDT)


Lately I've reported Adaptec 2940 SCSI and ext2 trouble, which I had
previously believed might be related to invalid scsi-bus termination or
the kernel. I now think there might be something fishy with my RAM.

With 1.2.13, after the first signs of ext2 trouble (about 6 hours into
uptime) the kernel usually panics. When rebooting there is a bare minimum
amount of e2fsck'n necessary to bring the drives back up to par.

With 1.3.90 however, the kernel doesn't go into a panic -- instead it will
just report errors, and more errors, until it eventually dies, resulting
in lots of data loss when the system comes online again. My /dev/sda1 was
completely trashed (everything was moved into /lost+found) and it took me
all day to recover. I would like to use my Tape Backup unit, but thats
even more risky then leaving the system be :>

Does anyone know offhand where I can get the patch or utility that will
restarts the system at the first sign of trouble? Also, how would I have
the system automatically e2fsck -y on startup.. sometimes it drops into a
shell if there is a problem and the system won't come back up unless I'm
at the terminal. This should probably go into the Tips-HOWTO.

And finally, I could really use a utility to thoroughly evaluate the
reliability of my memory. Is there something I could run for a few days
that will help me determine which of my simms are bad?

Thanks.