[PATCH v2 2/2] irq: detect long-running IRQ handlers
From: Mark Rutland
Date: Tue Jun 15 2021 - 06:25:43 EST
If a hard IRQ handler takes a long time to handle an IRQ, it may cause a
soft lockup or RCU stall, but as this will be detected once the handler
has returned it can be difficult to attribute the delay to the specific
IRQ handler.
It's possible to trace IRQ handlers to diagnose this, but that's not a
great fit for automated testing environments (e.g. fuzzers), where
something like the existing lockup/stall detectors works well.
This patch adds a new stall detector for IRQ handlers, which reports
when handlers took longer than a given timeout value (defaulting to 1
second). This won't detect hung IRQ handlers (which requires an NMI, and
should already be caught by hung task detection on systems with NMIs),
but helps on platforms without NMI or where a periodic watchdog is
undesireable.
Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
Cc: Marc Zyngier <maz@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
---
kernel/irq/internals.h | 35 ++++++++++++++++++++++++++++++++---
lib/Kconfig.debug | 15 +++++++++++++++
2 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 70a4694cc891..191b6a9d30e2 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -6,6 +6,7 @@
* kernel/irq/. Do not even think about using any information outside
* of this file for your non core code.
*/
+#include <linux/bug.h>
#include <linux/irqdesc.h>
#include <linux/kernel_stat.h>
#include <linux/pm_runtime.h>
@@ -122,17 +123,45 @@ static inline irqreturn_t __handle_irqaction(unsigned int irq,
return res;
}
+#ifdef CONFIG_DETECT_SLOW_IRQ_HANDLER
+static inline irqreturn_t __handle_check_irqaction(unsigned int irq,
+ struct irqaction *action,
+ void *dev_id)
+{
+ u64 timeout = CONFIG_IRQ_HANDLER_TIMEOUT_NS;
+ u64 start, end, duration;
+ int res;
+
+ start = local_clock();
+ res = __handle_irqaction(irq, action, dev_id);
+ end = local_clock();
+
+ duration = end - start;
+ WARN(duration > timeout, "IRQ %d handler %ps took %llu ns\n",
+ irq, action->handler, duration);
+
+ return res;
+}
+#else
+static inline irqreturn_t __handle_check_irqaction(unsigned int irq,
+ struct irqaction *action,
+ void *dev_id)
+{
+ return __handle_irqaction(irq, action, dev_id);
+}
+#endif
+
static inline irqreturn_t handle_irqaction(unsigned int irq,
struct irqaction *action)
{
- return __handle_irqaction(irq, action, action->dev_id);
+ return __handle_check_irqaction(irq, action, action->dev_id);
}
static inline irqreturn_t handle_irqaction_percpu_devid(unsigned int irq,
struct irqaction *action)
{
- return __handle_irqaction(irq, action,
- raw_cpu_ptr(action->percpu_dev_id));
+ return __handle_check_irqaction(irq, action,
+ raw_cpu_ptr(action->percpu_dev_id));
}
/* Resending of interrupts :*/
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 678c13967580..2c6a501fa9b9 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1144,6 +1144,21 @@ config WQ_WATCHDOG
state. This can be configured through kernel parameter
"workqueue.watchdog_thresh" and its sysfs counterpart.
+config DETECT_SLOW_IRQ_HANDLER
+ bool "Detect long-running IRQ handlers"
+ help
+ Say Y here to enable detection of long-running IRQ handlers. When a
+ (hard) IRQ handler returns after a given timeout value (1s by
+ default) a warning will be printed with the name of the handler.
+
+ This can help to identify specific IRQ handlers which are
+ contributing to stalls.
+
+config IRQ_HANDLER_TIMEOUT_NS
+ int "Timeout for long-running IRQ handlers (in nanoseconds)"
+ depends on DETECT_SLOW_IRQ_HANDLER
+ default 1000000000
+
config TEST_LOCKUP
tristate "Test module to generate lockups"
depends on m
--
2.11.0