Re: NMI problems with Dell SMP Xeons

From: Keith Owens
Date: Mon May 22 2006 - 21:27:39 EST


Andi Kleen (on Tue, 23 May 2006 01:56:39 +0200) wrote:
>
>> (1) IPI 2, not marked as NMI. This does _not_ call into the do_nmi()
>> routine.
>>
>> People have been telling me (hi, Andi:) that sending interrupt 2 as
>> an IPI automatically sends it as an NMI.
>
>I can't remember ever saying that. I said that sending anything with the
>NMI bit set will end up at the NMI vector, not the original vector
>you specified. Or at least that is what the Intel manual specify.
>That is why it is useless to hook the original vector like you did
>and add special cases just to get an NMI send with different vector.

I have never disagreed that all NMIs will end up on the NMI vector (2).
But there still has to be code in arch/*/kernel to detect that the IPI
being sent is to be marked as NMI, IOW you still need the code that
sets APIC_DM_NMI. Whether that is done by using a special vector
number (i386 does) or by defining a separate routine for sending NMI
(x86_64 does) is a matter for debate.

Unfortunately the way that you changed the x86_64 kdb code, it now does
neither. Your hack to kdb sends an IPI using NMI_VECTOR (2) which is

(a) not actually sent as an NMI and
(b) on most of the hardware I have tested, it does not even get through
to the other cpus, instead it generates APIC errors.

FWIW, kdb on ia64 first sends a normal maskable IPI using its own
KDB_VECTOR and waits for the other cpus to rendezvous. Only if some
cpus have not rendezvoused does ia64 kdb resort to using a non-maskable
interrupt. I have found that this gives much better backtracing and is
more reliable. It is a sad fact of life that NMIs can be delivered in
the middle of code that sets up the kernel stack, when that happens it
is impossible to backtrace. I am changing kdb on i386 to do the same
two step process, try a normal interrupt first, wait a bit then resort
to NMI.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/