[v2 PATCH 1/3] x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus

From: Don Zickus
Date: Thu Oct 13 2011 - 15:20:17 EST


A recent discussion started talking about the locking on the pstore fs
and how it relates to the kmsg infrastructure. We noticed it was possible
for userspace to r/w to the pstore fs (grabbing the locks in the process)
and block the panic path from r/w to the same fs.

The reason was the cpu with the lock could be doing work while the crashing
cpu is panic'ing. Busting those spinlocks might cause those cpus to step
on each other's data. Fine, fair enough.

It was suggested it would be nice to serialize the panic path (ie stop
the other cpus) and have only one cpu running. This would allow us to
bust the spinlocks and not worry about another cpu stepping on the data.

Of course, smp_send_stop() does this in the panic case. kmsg_dump() would
have to be moved to be called after it. Easy enough.

The only problem is on x86 the smp_send_stop() function calls the
REBOOT_VECTOR. Any cpu with irqs disabled (which pstore and its backend
ERST would do), block this IPI and thus do not stop. This makes it
difficult to reliably log data to the pstore fs.

The patch below switches from the REBOOT_VECTOR to NMI (and mimics what
kdump does). Switching to NMI allows us to deliver the IPI when irqs are
disabled, increasing the reliability of this function.

However, Andi carefully noted that on some machines this approach does not
work because of broken BIOSes or whatever.

To help accomodate this, the next couple of patches will run a selftest and
provide a knob to disable.

V2:
uses atomic ops to serialize the cpu that shuts everyone down
V3:
comment cleanup

Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
---

[note] this patch sits on top of another NMI infrastructure change I have
submitted, so the nmi registeration might not apply cleanly without that patch.
---
arch/x86/kernel/smp.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 013e7eb..991d184 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -28,6 +28,7 @@
#include <asm/mmu_context.h>
#include <asm/proto.h>
#include <asm/apic.h>
+#include <asm/nmi.h>
/*
* Some notes on x86 processor bugs affecting SMP operation:
*
@@ -147,6 +148,60 @@ void native_send_call_func_ipi(const struct cpumask *mask)
free_cpumask_var(allbutself);
}

+static atomic_t stopping_cpu = ATOMIC_INIT(-1);
+
+static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
+{
+ /* We are registered on stopping cpu too, avoid spurious NMI */
+ if (raw_smp_processor_id() == atomic_read(&stopping_cpu))
+ return NMI_HANDLED;
+
+ stop_this_cpu(NULL);
+
+ return NMI_HANDLED;
+}
+
+static void native_nmi_stop_other_cpus(int wait)
+{
+ unsigned long flags;
+ unsigned long timeout;
+
+ if (reboot_force)
+ return;
+
+ /*
+ * Use an own vector here because smp_call_function
+ * does lots of things not suitable in a panic situation.
+ */
+ if (num_online_cpus() > 1) {
+ /* did someone beat us here? */
+ if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id() != -1))
+ return;
+
+ if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
+ NMI_FLAG_FIRST, "smp_stop"))
+ /* Note: we ignore failures here */
+ return;
+
+ /* sync above data before sending NMI */
+ wmb();
+
+ apic->send_IPI_allbutself(NMI_VECTOR);
+
+ /*
+ * Don't wait longer than a second if the caller
+ * didn't ask us to wait.
+ */
+ timeout = USEC_PER_SEC;
+ while (num_online_cpus() > 1 && (wait || timeout--))
+ udelay(1);
+ }
+
+ local_irq_save(flags);
+ disable_local_APIC();
+ local_irq_restore(flags);
+}
+
/*
* this function calls the 'stop' function on all other CPUs in the system.
*/
@@ -159,7 +214,7 @@ asmlinkage void smp_reboot_interrupt(void)
irq_exit();
}

-static void native_stop_other_cpus(int wait)
+static void native_irq_stop_other_cpus(int wait)
{
unsigned long flags;
unsigned long timeout;
@@ -229,7 +284,7 @@ struct smp_ops smp_ops = {
.smp_prepare_cpus = native_smp_prepare_cpus,
.smp_cpus_done = native_smp_cpus_done,

- .stop_other_cpus = native_stop_other_cpus,
+ .stop_other_cpus = native_nmi_stop_other_cpus,
.smp_send_reschedule = native_smp_send_reschedule,

.cpu_up = native_cpu_up,
--
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/