Re: Linux & ECC memory

Steve VanDevender (stevev@efn.org)
Thu, 14 Nov 1996 16:46:28 -0800


Kenneth Albanowski writes:
> On Thu, 14 Nov 1996, Richard B. Johnson wrote:
>
> > ECC is handled in HARDWARE. It has to be. The idea is try to fix bad
> > memory fetches rather than just executing the NMI which would declare
> > that the system is broken then halt. I have not looked at the Linux NMI
> > code, but with many errors of RAM using ECC, the NMI would not be executed
> > because the bad fetch would be corrected.
>
> This is what I'm curious about. Does Linux's NMI code attempt to work
> around some memory problems, or does it just panic? Also, can the glue
> report successful error-correction, as well as failed error-correction?
> (Or is it not useful to know if your memory has/had an error that was
> correctable?)

It's necessary to have access to the additional bits used for ECC in
order to attempt correction in software. I don't know of any systems
that let you have access to the parity bit on a byte at the software
level. ECC needs at least three extra bits per word to correct
single-bit and detect double-bit errors.

> --
> Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)