Re: NMI errors in 2.0.30??

Stephen Costaras (stevecs@chaven.com)
Sat, 26 Apr 1997 20:55:23 -0500 (CDT)


[lots of very good stuff snipped]

> The temperature inside your box really should not be greater than about
> 45 degC. You should check this out.

Dick-

Thanks for the long response. Knowing that the kernel doesn't play a part
in memory errors is useful. I have checked the memory doing copies, tars,
large sorts, et al. Also your suggestion on temperature is something that
I ran into a long while ago. Currently, all machines here are running between
70-75 degrees F. (ambient) So memory problems due to temp should be non-existant.

I don't understand fully why with the exact same setup NMI's are only showing up
w/ the 2.0.30 kernel & ECC enabled. But not under the 2.0.29 kernel w/ ECC.
(tried three sets of chips so far, they all can't be bad as they are (and have been)
functional in my other machines for well over six months).

I saw mention of a 'speed' issue with memory & bios settings. Basic premis was
that the bios was set up to push the memory to the limit and the new kernel was
somewhat faster in how it accessed the hardware causing problems. Under this I
figured I could try and 'disprove' this theory by leaving all bios settings the
same when booting 2.0.30 except one. To change from ECC to Parity checking.
Going to Parity, I show no errors, but with ECC turned on I show errors. (BUT
also only under 2.0.30, not previous kernels).

Stephen Costaras
stevecs@chaven.com