[PATCH 2/3] alpha: Fix SMP IPI loss when target CPU is in interrupt handler
From: Matt Turner
Date: Sat May 30 2026 - 16:26:29 EST
On EV7/IO7, the wripir PALcall delivers IPIs as edge-triggered hardware
signals through the IO7 I/O controller. If the target CPU is already
executing at IPL=7 inside do_entInt handling another interrupt, the IPI
edge is lost: the hardware never re-delivers it when the CPU drops back
to IPL=0.
The software IPI bit in ipi_data[cpu].bits is set before wripir is
called, so it remains set after the interrupt handler returns. But
because no hardware edge fires, handle_ipi() is never invoked again,
and the sending CPU spins forever in csd_lock_wait.
This race is the root cause of a 15-year SMP deadlock on EV7/Marvel
systems. It is reliably triggered by workloads that generate many
synchronous IPIs (TLB flushes via on_each_cpu(wait=1)) while the
target CPU receives concurrent I/O or RTC interrupts.
Fix: add alpha_poll_ipi_inirq(), called from do_entInt within each
interrupt handler's irq_enter/irq_exit bracket. It checks
ipi_data[smp_processor_id()].bits and drains any pending IPIs that
arrived while we were at IPL=7, before irq_exit() opens the softirq
window where a TLB-flush softirq could itself deadlock on
alpha_smp_ipi_lock. The check is a single READ_ONCE so there is no
overhead when no IPI was missed.
For the RTC interrupt (case 1 in do_entInt), handle_irq() already calls
its own irq_enter()/irq_exit() internally. The outer irq_enter/irq_exit
pair added here is intentional: it keeps irq_count > 0 while handle_irq()
runs, so handle_irq()'s inner irq_exit() sees a non-zero count and skips
the softirq window. The softirq window is deferred until the outer
irq_exit(), which runs after alpha_poll_ipi_inirq() has already drained
any pending IPIs. Without this outer bracket, irq_exit() inside
handle_irq() could open the softirq window before any missed IPIs are
rescued, risking a deadlock on alpha_smp_ipi_lock.
Approximately 98% of rescued IPIs are IPI_CALL_FUNC (the TLB-flush
type), confirming that IO7 genuinely drops the hardware edge rather than
holding it pending until IPL falls.
A lost IPI_CALL_FUNC only deadlocks when the sender is blocking (wait=1).
wait=0 callers do not hang, but silently skip the function on the remote
CPU, which may be a correctness issue in its own right.
This fix is complementary to the alpha_smp_ipi_lock serialization
(previous commit). Both are required:
- Serialization prevents two CPUs simultaneously issuing wait=1 IPIs
from deadlocking each other in csd_lock_wait.
- This fix prevents a single wait=1 caller from deadlocking due to an
IPI edge lost to an IPL=7 window on the remote CPU.
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Matt Turner <mattst88@xxxxxxxxx>
---
arch/alpha/kernel/irq_alpha.c | 29 ++++++++++++++++++++++++++++-
arch/alpha/kernel/proto.h | 1 +
arch/alpha/kernel/smp.c | 35 +++++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 1 deletion(-)
diff --git ./arch/alpha/kernel/irq_alpha.c ./arch/alpha/kernel/irq_alpha.c
index ac941172ae66..0e4234ef7ea0 100644
--- ./arch/alpha/kernel/irq_alpha.c
+++ ./arch/alpha/kernel/irq_alpha.c
@@ -69,22 +69,49 @@ do_entInt(unsigned long type, unsigned long vector,
break;
#endif
case 1:
- /* handle_irq() already does irq_enter()/irq_exit() */
+ /*
+ * Wrap handle_irq() in our own irq_enter/irq_exit so that the
+ * inner irq_exit() inside handle_irq() does not run softirqs
+ * (irq_count remains > 0). We poll for lost IPIs before the
+ * outer irq_exit(), which is where softirqs may run. This
+ * prevents a TLB flush softirq from deadlocking on
+ * alpha_smp_ipi_lock while the sending CPU waits for our ACK.
+ */
+ irq_enter();
handle_irq(RTC_IRQ);
+#ifdef CONFIG_SMP
+ alpha_poll_ipi_inirq(regs);
+#endif
+ irq_exit();
break;
case 2:
irq_enter();
alpha_mv.machine_check(vector, la_ptr);
+#ifdef CONFIG_SMP
+ alpha_poll_ipi_inirq(regs);
+#endif
irq_exit();
break;
case 3:
irq_enter();
alpha_mv.device_interrupt(vector);
+#ifdef CONFIG_SMP
+ /*
+ * Drain any IPIs whose edge was lost while we were at IPL=7.
+ * Must be called before irq_exit() to prevent softirqs (e.g.
+ * a TLB flush) from deadlocking on alpha_smp_ipi_lock while
+ * the sending CPU spins in csd_lock_wait.
+ */
+ alpha_poll_ipi_inirq(regs);
+#endif
irq_exit();
break;
case 4:
irq_enter();
perf_irq(la_ptr, regs);
+#ifdef CONFIG_SMP
+ alpha_poll_ipi_inirq(regs);
+#endif
irq_exit();
break;
default:
diff --git ./arch/alpha/kernel/proto.h ./arch/alpha/kernel/proto.h
index f138bd494628..04879e0b2932 100644
--- ./arch/alpha/kernel/proto.h
+++ ./arch/alpha/kernel/proto.h
@@ -120,6 +120,7 @@ extern void unregister_srm_console(void);
/* smp.c */
extern void setup_smp(void);
extern void handle_ipi(struct pt_regs *);
+extern void alpha_poll_ipi_inirq(struct pt_regs *);
extern void __init smp_callin(void);
/* bios32.c */
diff --git ./arch/alpha/kernel/smp.c ./arch/alpha/kernel/smp.c
index d900da49b0d8..099e1ac6a0d6 100644
--- ./arch/alpha/kernel/smp.c
+++ ./arch/alpha/kernel/smp.c
@@ -557,6 +557,41 @@ handle_ipi(struct pt_regs *regs)
recv_secondary_console_msg();
}
+/*
+ * On EV7/IO7, IPI signals are edge-triggered. If an IPI arrives while this
+ * CPU is executing at IPL=7 (inside another interrupt handler), the hardware
+ * edge is lost. The software bit in ipi_data[] remains set but handle_ipi()
+ * is never re-invoked, causing the sending CPU to spin forever in csd_lock_wait.
+ *
+ * Call this from within hardirq context (between irq_enter and irq_exit) to
+ * drain any IPIs that arrived while we were running at IPL=7, before irq_exit()
+ * opens the softirq window where a TLB flush could deadlock on alpha_smp_ipi_lock.
+ */
+void alpha_poll_ipi_inirq(struct pt_regs *regs)
+{
+ int cpu = smp_processor_id();
+ unsigned long bits = READ_ONCE(ipi_data[cpu].bits);
+
+ if (!bits)
+ return;
+
+ /*
+ * Peek at type bits before handle_ipi() clears them via xchg().
+ * Bits arriving after this READ_ONCE are drained but not counted;
+ * the counters are approximate but sufficient for diagnosis.
+ * Note: handle_ipi() also increments ipi_count, so the "IPI:" row
+ * in /proc/interrupts includes both normal and rescued deliveries.
+ */
+ if (bits & (1UL << IPI_RESCHEDULE))
+ cpu_data[cpu].rescued_reschedule_count++;
+ if (bits & (1UL << IPI_CALL_FUNC))
+ cpu_data[cpu].rescued_call_func_count++;
+ if (bits & (1UL << IPI_CPU_STOP))
+ cpu_data[cpu].rescued_cpu_stop_count++;
+
+ handle_ipi(regs);
+}
+
void
arch_smp_send_reschedule(int cpu)
{
--
2.53.0