Re: Linux & ECC memory

Rob Hagopian (Rob.Hagopian@vuser.vu.union.edu)
Thu, 14 Nov 1996 21:49:15 -0500 (EST)


>> Albert Calahan just sent me some mail saying the hardware doesn't report
>> the failed memory location when the NMI is triggered, so that would answer
>> my question -- Linux can't attempt to ammeliorate an error, as it doesn't
>> know where it happened.

Wouldn't linux know which process was active (and generated) the NMI though?\
I would think that the kernel could at least kill the process and unmap the
physical pages used by that process at the time.

This would a) keep the system running and b) provide at least some indication
of where the error is (although memtest86 will be more useful in this regard)

> I believe the Machine Check Architecture implemented in the P6 and P5 CPUs is
> what needs to be looked into. If the Machine Check Exception is enabled,
> information is placed into special registers detailing memory errors that have
> occurred.

Of course, this would be even better :-)
-Rob Hagopian