Warning! "clock" program disables NMI interrupts.

Larry J. Blunk (ljb@merit.edu)
Wed, 30 Apr 1997 13:26:17 -0400


There has been some discussion recently about memory
parity/ECC and NMI interrupts. I recently purchased
a couple of Pentium Pro systems with 16x36 64MB 60ns
memories. One of the systems was experiencing problems
which appeared to be memory related (the classic sig 11
gcc failures, etc.). Swapping the memory between the
2 systems verified that it was in fact a memory problem.

However, I was curious as to why I was not seeing
any NMI interrupts, despite having ECC checking enabled
in the BIOS. After doing some reading of various Intel
chipset literature, I discovered that NMI interrupts
can be disabled by writing to I/O port 0x70 with the
high bit set (0x80).

Unfortunately, this I/O port is shared with the RTC
(real-time clock) for selecting registers to R/W (at I/O
port 0x71). After snooping around for quite awhile,
I discovered that the "clock" program was always writing
to I/O port 0x70 with the high bit set (thus disabling
NMI interrupts). The clock program is typically run
at boot time to set the system time based on the RTC
clock time. I verified that this was in fact the problem
by patching the clock program to not set the high bit
when writing I/O port 0x70. After doing this, I began
receiving NMI interrupt notifications when experiencing
memory problems.

I would add that if you are running ntpd
(network time daemon) you probably would not experience
this problem as the kernel will write the RTC every 11
minutes to keep it in sync with the system time. The
kernel code is more careful than the clock program
about writing I/O port 0x70 with the high bit cleared.
Also, the clock program in version 2.6 of the linux-utils
package will go through the /dev/rtc device if available
rather than manipulating the RTC directly. /dev/rtc
leaves the high bit clear, so it will not disable
NMI interrupts. Unfortunately, both RedHat 4.1 and
Slackware 3.1 (Slackware 96) distributions used the
2.5 version linux-utils.

FYI, the SIMM's having problems turned out to be of
the "buffered" variety. Despite being advertised as
60ns memory, I had to back off the BIOS timing to
70ns to get them to work reliably. The second set of SIMM's
were unbuffered and experienced no problems when running
with with the 60ns timing set in the BIOS. My advise would
be to avoid the buffered SIMM's if possible (or otherwise
be prepared to back off your memory timing in the BIOS).

------
Larry J. Blunk
Merit Network, Inc. Ann Arbor, Michigan