Re: [PATCH 5/6] x86, NMI: Allow NMI reason io port (0x61) to be processedon any CPU

From: Cyrill Gorcunov
Date: Sat Feb 26 2011 - 07:34:27 EST


On 02/26/2011 02:19 PM, huang ying wrote:
Hi,

On Sat, Feb 26, 2011 at 4:02 PM, Cyrill Gorcunov<gorcunov@xxxxxxxxx> wrote:
On 02/23/2011 05:39 AM, Maciej W. Rozycki wrote:
...

[Catching up with old e-mail...]

In line with the comment above that you're removing -- have you (or
anyone else) adjusted code elsewhere so that external NMIs are actually
delivered to processors other than the BSP? I can't see such code in this
series nor an explanation as to why it wouldn't be needed.

For the record -- the piece of code above reflects our setup where the
LINT1 input is enabled and configured for the NMI delivery mode on the BSP
only and all the other processors have this line disabled in their local
APIC units. If system NMIs are to be handled after the removal of the
BSP, then another processor has to be selected and configured for NMI
reception. Alternatively, all local units could have their LINT1 input
enabled and arbitrate handling, although it would be quite disruptive as
all the processors would take the interrupt if it happened. OTOH it would
be more fault-tolerant in the case of a CPU failure. On a typical x86 box
the system NMI cannot be routed to an I/O APIC input.

Maciej

Hi Maciej, good catch! The code doesn't reconfig LVT. As just Don pointed
it might be Intel is working on something, dunno. Probably we better should
drop this patch for now (at least until LVT reconfig would not be
implemented).


Hi Huang,

Why? Without LVT reconfig, system with this patch can not work
properly?

I guess we have a few nits here -- first an important comment were
removed which doesn't reflect what happens on hw level for real. At
least we should put it back just to not confuse people who read this
code, something like

/*
* FIXME: Only BSP can see external NMI for now and hot-unplug
* for BSP is not yet implemented
*/
WARN_ON_ONCE(smp_processor_id());

The reason for WARN_ON_ONCE here is that -- imagine the situation when
perf-nmi happens on one cpu with external nmi on BSP and for some reason
(say code on upper level is screwed\bogus or anything else) nmi-notifier
didn't handled it properly as result we might have a report like "SERR for
reason xx on CPU 1" while this cpu didn't see this signal at all. And then
due to locking ordering BSP will see unknown nmi while in real those nmi belongs
him and it was CPU 1 who observed erronious NMI from upper level. Note this
is theoretical scenario I never saw anything like this ;)

And since LVT reconfig might not be that simple as we might imagine I think
having additional lock in nmi handling code is not good at all.

This is just one of the steps to make CPU 0 hot-removable.
We must enable CPU 0 hot-removing in one step?

Not of course but as I said having additional lock here for free
is not that good until we have a serious reason for it.

Though, I would be glad if I'm wrong in my conclusions ;)

--
Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/