[PATCH 2/3] x86, reboot: Use NMI to assist in shutting down if IRQ fails

From: Don Zickus
Date: Fri May 11 2012 - 14:41:32 EST

x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback

For v3.3, I added code to use the NMI to stop other cpus in the panic
case. The idea was to make sure all cpus on the system were definitely
halted to help serialize the panic path to execute the rest of the
code on a single cpu.

The main problem it was trying to solve was how to stop a cpu that
was spinning with its irqs disabled. A IPI irq would be stuck and
couldn't get in there, but an NMI could.

Things were great until we had another conversation about some pstore
changes. Because some of the backend pstore still uses spinlocks to
protect the device access, things could get ugly if a panic happened
and we were stuck spinning on a lock.

Now with the NMI shutting down cpus, we could assume no other cpus were
running and just bust the spin lock and proceed.

The counter argument was, well if you do that the backend could be in
a screwed up state and you might not be able to save anything as a result.
If we could have just given the cpu a little more time to finish things,
we could have grabbed the spin lock cleanly and everything would have been

Well, how do give a cpu a 'little more time' in the panic case? For the
most part you can't without spinning on the lock and even in that case,
how long do you spin for?

So instead of making it ugly in the pstore code, just mimic the idea that
stop_machine had, which is block on an IRQ IPI until the remote cpu has
re-enabled interrupts and left the critical region. Which is what happens
now using REBOOT_IRQ.

Then leave the NMI case for those cpus that are truly stuck after a short
time. This leaves the current behaviour alone and just handle a corner
case. Most systems should never have to enter the NMI code and if they
do, print out a message in case the NMI itself causes another issue.

Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
arch/x86/kernel/smp.c | 61 +++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 6d20f52..228e740 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -29,6 +29,7 @@
#include <asm/mmu_context.h>
#include <asm/proto.h>
#include <asm/apic.h>
+#include <asm/nmi.h>
* Some notes on x86 processor bugs affecting SMP operation:
@@ -108,6 +109,8 @@
* about nothing of note with C stepping upwards.

+static atomic_t stopping_cpu = ATOMIC_INIT(-1);
* this function sends a 'reschedule' IPI to another CPU.
* it goes straight through and wastes no time serializing
@@ -148,6 +151,17 @@ void native_send_call_func_ipi(const struct cpumask *mask)

+static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
+ /* We are registered on stopping cpu too, avoid spurious NMI */
+ if (raw_smp_processor_id() == atomic_read(&stopping_cpu))
+ return NMI_HANDLED;
+ stop_this_cpu(NULL);
+ return NMI_HANDLED;
* this function calls the 'stop' function on all other CPUs in the system.
@@ -171,13 +185,25 @@ static void native_stop_other_cpus(int wait)
* Use an own vector here because smp_call_function
* does lots of things not suitable in a panic situation.
- * On most systems we could also use an NMI here,
- * but there are a few systems around where NMI
- * is problematic so stay with an non NMI for now
- * (this implies we cannot stop CPUs spinning with irq off
- * currently)
+ */
+ /*
+ * We start by using the REBOOT_VECTOR irq.
+ * The irq is treated as a sync point to allow critical
+ * regions of code on other cpus to release their spin locks
+ * and re-enable irqs. Jumping straight to an NMI might
+ * accidentally cause deadlocks with further shutdown/panic
+ * code. By syncing, we give the cpus up to one second to
+ * finish their work before we force them off with the NMI.
if (num_online_cpus() > 1) {
+ /* did someone beat us here? */
+ if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1)
+ return;
+ /* sync above data before sending IRQ */
+ wmb();

@@ -188,7 +214,32 @@ static void native_stop_other_cpus(int wait)
while (num_online_cpus() > 1 && (wait || timeout--))
+ /* if the REBOOT_VECTOR didn't work, try with the NMI */
+ if ((num_online_cpus() > 1)) {
+ if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
+ NMI_FLAG_FIRST, "smp_stop"))
+ /* Note: we ignore failures here */
+ /* Hope the REBOOT_IRQ is good enough */
+ goto finish;
+ /* sync above data before sending IRQ */
+ wmb();
+ pr_emerg("Shutting down cpus with NMI\n");
+ apic->send_IPI_allbutself(NMI_VECTOR);
+ /*
+ * Don't wait longer than a 10 ms if the caller
+ * didn't ask us to wait.
+ */
+ timeout = USEC_PER_MSEC * 10;
+ while (num_online_cpus() > 1 && (wait || timeout--))
+ udelay(1);
+ }


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/