Re: Huge uptimes & cosmic rays

Gabriel Paubert (paubert@iram.es)
Thu, 10 Jul 1997 15:32:29 +0200 (METDST)


On 10 Jul 1997, Daniel Quinlan wrote:

> Good reason to use ECC DRAM. I have seen computations go wrong because
> of one-off bit errors. In once case, a machine had ECC accidentally
> disabled. Multiple runs of an intensive computation caused one-off bit
> errors in different places, which disappeared when ECC was turned on.

And serious systems do have ECC, has anybody written anything for Linux ?

It's useful to report this type of error to monitor memory health.
I know that Intel chipsets (except 450) are crap, but I would like to
have ECC error reporting for some critical applications here. I plan to
write something, but it won't be for Intel.


> I love that story... anyway, with ECC, the probability of these random
> bit errors is significantly lower.

Not exactly, an ECC memory system has more bits for redundancy, so the
number of errors is actually slightly higher (72 memory bits for 64 useful
bits). But the fact that it can be corrected more than makes up for it.

Gabriel.