[patch 01/11] hrtimer: Add a mechanism to catch runaway timers
From: Thomas Gleixner
Date: Thu Sep 23 2021 - 12:04:26 EST
A recent report from syzbot unearthed a problem with self rearming timers
which fire late and try to catch up to now with a short period. That causes
the timer to be rearmed in the past until it eventually catches up with
now. If that rearming happens from the timer callback the hard or soft
interrupt expiry loop can run for a long time with either interrupts or
bottom halves disabled which causes RCU stalls and other lockups.
There is no safety net to stop or at least identify such runaway timers.
Detection is trivial. Cache the pointer to the last expired timer. The next
invocation from the same loop compares the pointer with the next expiring
hrtimer pointer and if they match 10 times in a row (in the same hard or
soft interrupt expiry instance) then it's reasonable to declare it as a
runaway.
In that case emit a warning and skip the callback invocation which stops
the misbehaving timer right there.
It's obviously incomplete, but it's definitely better than nothing and would
have caught the reported issue in mac80211_hwsim.
Suggested-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
---
kernel/time/hrtimer.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1714,6 +1714,13 @@ static void __run_hrtimer(struct hrtimer
base->running = NULL;
}
+static void hrtimer_del_runaway(struct hrtimer_clock_base *base,
+ struct hrtimer *timer)
+{
+ __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0);
+ pr_warn("Runaway hrtimer %p %ps stopped\n", timer, timer->function);
+}
+
static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now,
unsigned long flags, unsigned int active_mask)
{
@@ -1722,6 +1729,8 @@ static void __hrtimer_run_queues(struct
for_each_active_base(base, cpu_base, active) {
struct timerqueue_node *node;
+ struct hrtimer *last = NULL;
+ unsigned int cnt = 0;
ktime_t basenow;
basenow = ktime_add(now, base->offset);
@@ -1732,6 +1741,22 @@ static void __hrtimer_run_queues(struct
timer = container_of(node, struct hrtimer, node);
/*
+ * Catch timers which rearm themself with a expiry
+ * time in the past over and over which makes this
+ * loop run forever.
+ */
+ if (IS_ENABLED(CONFIG_DEBUG_OBJECTS_TIMERS)) {
+ if (unlikely(last == timer)) {
+ if (++cnt == 10) {
+ hrtimer_del_runaway(base, timer);
+ continue;
+ }
+ }
+ last = timer;
+ cnt = 0;
+ }
+
+ /*
* The immediate goal for using the softexpires is
* minimizing wakeups, not running timers at the
* earliest interrupt after their soft expiration.