Re: [PATCH 5/6] x86, NMI: Allow NMI reason io port (0x61) to beprocessed on any CPU

From: Maciej W. Rozycki
Date: Tue Feb 22 2011 - 21:40:06 EST


On Thu, 6 Jan 2011, Don Zickus wrote:

> In original NMI handler, NMI reason io port (0x61) is only processed
> on BSP. This makes it impossible to hot-remove BSP. To solve the
> issue, a raw spinlock is used to allow the port to be processed on any
> CPU.
>
> Originally-by: Huang Ying <ying.huang@xxxxxxxxx>
> Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
> ---
> arch/x86/kernel/traps.c | 16 ++++++++++------
> 1 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 23f6ac0..613b3d2 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -402,13 +406,12 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
> return;
>
> - cpu = smp_processor_id();
> -
> - /* Only the BSP gets external NMIs from the system. */
> - if (!cpu)
> - reason = get_nmi_reason();
> + /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> + raw_spin_lock(&nmi_reason_lock);
> + reason = get_nmi_reason();
>
> if (!(reason & NMI_REASON_MASK)) {
> + raw_spin_unlock(&nmi_reason_lock);
> unknown_nmi_error(reason, regs);
>
> return;

[Catching up with old e-mail...]

In line with the comment above that you're removing -- have you (or
anyone else) adjusted code elsewhere so that external NMIs are actually
delivered to processors other than the BSP? I can't see such code in this
series nor an explanation as to why it wouldn't be needed.

For the record -- the piece of code above reflects our setup where the
LINT1 input is enabled and configured for the NMI delivery mode on the BSP
only and all the other processors have this line disabled in their local
APIC units. If system NMIs are to be handled after the removal of the
BSP, then another processor has to be selected and configured for NMI
reception. Alternatively, all local units could have their LINT1 input
enabled and arbitrate handling, although it would be quite disruptive as
all the processors would take the interrupt if it happened. OTOH it would
be more fault-tolerant in the case of a CPU failure. On a typical x86 box
the system NMI cannot be routed to an I/O APIC input.

Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/