Re: Wow! Is memory ever cheap!

From: John Alvord (jalvo@mbay.net)
Date: Wed May 09 2001 - 00:59:41 EST


On Tue, 8 May 2001 22:22:10 -0700, Larry McVoy <lm@bitmover.com>
wrote:
>
>Just to make sure you understand: I think ECC is a fine thing. If I'm
>running systems with no other integrity checks, I'll take ECC and like it.
>However, having ECC does not mean that I trust that my data is safe,
>that is most certainly not a true statement. The bus, the disks, the
>disk controller, the disk driver, the buffer cache, etc, can all corrupt
>the data. Oh, yeah, let's not forget NFS. I have seen each and every
>one of those things corrupt data.

This is an interesting observation of a truth that was well known in
the second generation computers of the 1950s and 1960s. I first worked
at John Hancock... they had a bunch of 7074 machines. All those
systems made use of programmed checksums in each tape block and in
each full file. The reason was that those machines did not have ECC...
they did have parity checking if I remember right. With IBM's third
generation computers (S/360s) and probably other manufacturers, ECC
became a standard feature. Parity checking was added through different
data paths such as channel memory, buffer memory, etc. There was so
much protection added that the programmed checksums became
superfluous.

There were still odd moments. I remember working on an Amdahl computer
problem where some internal data paths... where the contents of one
register moved to an internal storage area... and the path did not
have parity. There was a machine fault... the path was electrically
open, so the contents of the register always became zero. But since it
wasn't parity checked, there was no machine check. I remember another
problem on the IBM 3033. Cosmic rays (really) caused one bit errors in
channel memory. That was parity but not ECC so you got a weird channel
check. Back at the diagnosis ranch, the board looked good. It was only
when someone noticed that the rate of such problems was proportional
to the height above sea level that the light bulb went on.

The lesson is that when paths are not checked, hardware or software,
data being held or transformed can change. Old lesson but a good one
to know.

john alvord
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 15 2001 - 21:00:15 EST