Re: Catching NForce2 lockup with NMI watchdog

From: Maciej W. Rozycki
Date: Wed Dec 17 2003 - 09:03:59 EST


On Tue, 16 Dec 2003, George Anzinger wrote:

> How confusing :( Could you give me some idea how this works? I have tried
> disable_irq(0) and, as best as I can tell, it does not do the trick. The
> confusion I have is understanding where in the chain of hardware each of these
> thing is taking place.

Well, strange -- it should mask the timer interrupt. But I've never
tried that and have proposed based on a source study only -- perhaps it
needs to be further investigated.

> For example, it would be "nice" if I could just turn off the PIT interrupt line
> so that both the NMI (PIT generated) and the PIT interrupt would be put on hold.

The counter gate of the 8254 chip is designed to do just that -- it's a
pity it's hardwired, but I can understand another SSI TTL latch of a
dubious utility was just too costly for the original PC in 1981.

> Your answer seems to indicate that disable_irq() is working down stream from
> where the NMI signal is connected to the PIT interrupt line, so we need to turn
> of the NMI as well. A picture would be nice here :)

I'll try my best:

+------+ OUT0 INTIN2 +--------+
| 8254 +--+-----------------------------------------+ |
+------+ | | I/O |
| IR0 +------+ INT +------+ INTIN0 | APIC |
+-----+ 8259 +-----+ glue +-+-------------+ |
+------+ +------+ | +---++---+
| ||
| ||
| ||
+-----------+---------+-... ||
| | ||
+--------+ | +--------+ | ||
| CPU #0 | | | CPU #1 | | ||
+--------+ | +--------+ | ||
| | LINT0 | | | LINT0 | ... ||
| local +---------+ | local +---------+ ||
| APIC | | APIC | ||
| | | | ||
+---++---+ +---++---+ ||
|| || ||
|| inter-APIC bus || ||
++====================++===============...===++

The system is a traditional i82489DX/Pentium/P6-style virtual-wire setup
with a serial inter-APIC bus and a full MP-spec feature set. More limited
systems may miss the OUT0->INTIN2 line and/or one or more of the
INT->INTIN0 or INT->LINT0 -- there needs to be only one. If any INT->sth
connections are missing then either the INT->LINT0 one for the bootstrap
processor (BSP) or the INT->LINT0 has to exist; other are optional.

For the system above the path for the 8254 timer interrupt is via INTIN2
and the inter-APIC bus as a LoPri APIC interrupt. The path for the NMI
watchdog is via the 8259 reconfigured to pass IR0 transparently to INT and
then LINT0 inputs of all processors, reconfigured for a NMI APIC
interrupt. Some glue at the INT output may prevent the NMI watchdog from
working -- the LINT0 inputs may not toggle back and forth.

If the OUT0->INTIN2 line is missing, the path for the 8254 timer
interrupt is via the 8259 reconfigured to pass IR0 transparently to INT,
then INTIN0 and the inter-APIC bus as a LoPri APIC interrupt. The path
for the NMI watchdog is also via the 8259 and then LINT0 inputs of all
processors, reconfigured for a NMI APIC interrupt. Again, some glue at
the INT output may prevent this set up from working, but if it does work,
then both the timer interrupt and the NMI watchdog do -- I've not heard of
a system having different glue logic for INTIN0 and LINT0.

If the above variant does not work, as a last resort, the path for the
8254 timer interrupt is via the 8259 reconfigured back into its usual mode
and then LINT0 of the BSP reconfigured for an ExtINTA APIC interrupt.
Additionally, since at this point the glue logic has probably already
locked up due to the messing done above, a few artiffical sets of double
INTA cycles are sent to the system bus using the RTC chip and INTIN8
reconfigured temporarily to send ExtINTA APIC interrupts via the
inter-APIC bus.

I do hope a thorough read of the description will make the available
variants clear. The I/O APIC input numbers may differ but so far they are
almost always as noted above.

Maciej

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: macro@xxxxxxxxxxxxx, PGP key available +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/