Re: NMI errors in 2.0.30??

Willem Riede (wriede@monmouth.com)
Sat, 26 Apr 1997 21:25:25 -0400


On Sat, 26 Apr 1997, Stephen Costaras <stevecs@chaven.com> wrote:
>
> Rebooted system, turned Parity mode (non ECC) on in BIOS, still under v2.0.29
> ran above tests, no problems.
>
> Booted (ECC) w/ 2.0.30 went through the same procedure, received NMI on boot
> (I have all my disks to auto fsck when mount-count is 1) when fscking the
> volume. Dropped down to maintence mode. remounted root as r/w. fscked
> RAID_0 volume, received several ext2 errors and fsck process died w/ Sig 11.
> Rebooted system again, same configuration after removing all disks from fstab
> except root. Ran Same routine as above, system died w/ 2 SIG11 errors (NMI)
> out of 5 iterations.
>
> Rebooted system (2.0.30) w/ Parity mode enabled in BIOS. No problems with
> five iterations.
>
> Now I'm a complete layman when dealing with kernel/hardware interactions, but
> this 'looks' like the kernel can't understand the ECC mode in the Tyan BIOS.
> ???? Can anyone help shed some more light here? Any suggestions on a more
> quantative test?
>
This could still be a marginal RAM speed issue in the following way: The data
from the memory exclusive of the ECC check is available just in time for the
CPU to read, with less than a nanosecond to spare. The ECC logic needs that
same data, and then takes a couple of ns to calculate the checksum. By that
time it is too late and the NMI occurs.

Regrettably, I don't know how to verify this theory without professional
measurement equipment. But if this is the case, giving the memory a bit more
time by tuning the BIOS timing parameters should work.

Good luck. Willem Riede.