Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check

From: Petr Mladek

Date: Thu Mar 05 2026 - 06:29:02 EST

On Wed 2026-03-04 16:58:35, Doug Anderson wrote:
> Hi,
>
> On Wed, Mar 4, 2026 at 6:44 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> >
> > On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> > > From: Mayank Rungta <mrungta@xxxxxxxxxx>
> > >
> > > Currently, arch_touch_nmi_watchdog() causes an early return that
> > > skips updating hrtimer_interrupts_saved. This leads to stale
> > > comparisons and delayed lockup detection.
> > >
> > > Update the saved interrupt count before checking the touched flag
> > > to ensure detection timeliness.
> >
> > IMHO, it is not that easy, see below.
> >
> > So I am curious. Have you found this when debugging a particular
> > problem or just by reading the code, please?
>
> As I understand it, Mayank found this because the watchdog was
> reacting significantly more slowly than he expected. In his caes, he
> tracked it down to the fact that 8250 console driver has several calls
> to touch_nmi_watchdog(), including on every call to console_write().
> This caused the watchdog to take _much_ longer to fire.
>
> On devices that fairly chatty w/ their output to the serial console,
> the console_write() is called almost constantly. That means that the
> watchdog is being touched constantly. If I remember Mayank tracked it
> down as:
>
> * watchdog_hardlockup_check() called and saves counter (1000)
> * timer runs and updates the timer (1000 -> 1001)
> * touch_nmi_watchdog() is called
> * CPU locks up
> * 10 seconds pass
> * watchdog_hardlockup_check() called and saves counter (1001)
> * 10 seconds pass
> * watchdog_hardlockup_check() called and notices touch

Great visualization!

Nit: It seems to be actually the other way around:

* 10 seconds pass
* watchdog_hardlockup_check() called and notices touch and skips updating counters
* 10 seconds pass
* watchdog_hardlockup_check() called and saves counter (1001)

> * 10 seconds pass
> * watchdog_hardlockup_check() called and finally detects lockup
>
> ...so we detect the lockup after 30 seconds, which is pretty bad. With
> his new scheme, we'd detect the lockup in 20 seconds.

Fair enough.

> > > @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
> > >
> > > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > {
> > > + bool is_hl;
> > > int hardlockup_all_cpu_backtrace;
> > > + /*
> > > + * Check for a hardlockup by making sure the CPU's timer
> > > + * interrupt is incrementing. The timer interrupt should have
> > > + * fired multiple times before we overflow'd. If it hasn't
> > > + * then this is a good indication the cpu is stuck
> > > + *
> > > + * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > + * so we make sure we still update the saved value of the interrupts.
> > > + * Without that we'll take an extra round through this function before
> > > + * we can detect a lockup.
> > > + */
> > > +
> > > + is_hl = is_hardlockup(cpu);
> > >
> > > if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > > per_cpu(watchdog_hardlockup_touched, cpu) = false;
> >
> > Hmm, this does not look correct to me.
> >
> > 2. Let's look at is_hardlockup() in detail:
> >
> > static bool is_hardlockup(unsigned int cpu)
> > {
> > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > per_cpu(hrtimer_interrupts_missed, cpu)++;
> > if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > return true;
> >
> > return false;
> > }
> >
> > per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> >
> > return false;
> > }
> >
> > If we call it when the watchdog was touched then
> > (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> >
> > => per_cpu(hrtimer_interrupts_missed, cpu)++;
> >
> > is called even when watchdog was touched.
> >
> > As a result, we might report stall which should have been hidden,
> > for example:
> >
> > CPU0 CPU1
> >
> > <NMI>
> > watchdog_hardlockup_check() # passes
> > is_hardlockup() # no
> > hr_int_saved = hr_int;
> > hr_int_missed = 0;
> > </NMI>
> >
> > local_irq_save()
> > printk()
> > console_trylock()
> > console_unlock()
> > console_flush_all()
> >
> > touch_nmi_watchdog()
> >
> > // Other CPUs print many messages,
> > // e.g. during boot when initializing a lot of HW
> > for (i=0; i<1000; i++) do
> > printk();
> >
> > <NMI>
> > watchdog_hardlockup_check()
> > is_hardlockup() # yes
> > hr_int_missed++ # 1
> >
> > # skip because touched
> > </NMI>
> >
> > touch_nmi_watchdog()
> >
> > <NMI>
> > watchdog_hardlockup_check()
> > is_hardlockup() # yes
> > hr_int_missed++ # 2
> >
> > # skip because touched
> > </NMI>
> >
> > ... repeat many times ...
> >
> > local_irq_restore()
> >
> > # this might normally trigger handling of pending IRQs
> > # including the timers. But IMHO, it can be offloaded
> > # to a kthread (at least on RT)
> >
> > <NMI>
> > watchdog_hardlockup_check()
> > is_hardlockup() # yes
> > hr_int_missed++ # might be already 3, 4,...
> >
> > Report hardlockup even when all the "hr_int_missed"
> > values should have been ignored because of
> > touch_watchdog.
> >
> > </NMI>
> >
> >
> > A solution might be clearing "hrtimer_interrupts_missed"
> > when the watchdog was touched.
>
> Great catch! When I was thinking about Mayank's patches, I thought
> about them independently. ...and I believe that independently, each
> patch is fine. The problem is that together they have exactly the
> problem you indicated.

Heh, I was not aware that "hrtimer_interrupts_missed" was added by
the 3rd patch. I looked at the final code with all patches applied ;-)

> Clearing "hrtimer_interrupts_missed" seems like the right solution in
> Mayank's patch #3.

OK, this 1st patch moves "is_hardlockup()" up because it has some
"side effects". It adds a 4-line comment to explain it.
But it still causes problems in the 3rd patch.

A better solution might be to separate the check and update/reset
of the values. Something like (on top of this patchset, just
compilation tested):

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 30199eaeb5d7..4d0851f0f412 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
per_cpu(watchdog_hardlockup_touched, cpu) = true;
}

-static bool is_hardlockup(unsigned int cpu)
+static void watchdog_hardlockup_update_reset(unsigned int cpu)
{
int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));

- if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
- per_cpu(hrtimer_interrupts_missed, cpu)++;
- if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
- return true;
-
- return false;
- }
-
/*
* NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
* for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
@@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
*/
per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
per_cpu(hrtimer_interrupts_missed, cpu) = 0;
+}

- return false;
+static bool is_hardlockup(unsigned int cpu)
+{
+ int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
+
+ if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
+ return false;
+
+ per_cpu(hrtimer_interrupts_missed, cpu)++;
+ if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
+ return false;
+
+ return true;
}

static void watchdog_hardlockup_kick(void)
@@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)

void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
{
- bool is_hl;
int hardlockup_all_cpu_backtrace;
- /*
- * Check for a hardlockup by making sure the CPU's timer
- * interrupt is incrementing. The timer interrupt should have
- * fired multiple times before we overflow'd. If it hasn't
- * then this is a good indication the cpu is stuck
- *
- * Purposely check this _before_ checking watchdog_hardlockup_touched
- * so we make sure we still update the saved value of the interrupts.
- * Without that we'll take an extra round through this function before
- * we can detect a lockup.
- */
-
- is_hl = is_hardlockup(cpu);

if (per_cpu(watchdog_hardlockup_touched, cpu)) {
+ watchdog_hardlockup_update_reset(cpu);
per_cpu(watchdog_hardlockup_touched, cpu) = false;
return;
}
@@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
1 : sysctl_hardlockup_all_cpu_backtrace;

- if (is_hl) {
+ /*
+ * Check for a hardlockup by making sure the CPU's timer
+ * interrupt is incrementing. The timer interrupt should have
+ * fired multiple times before we overflow'd. If it hasn't
+ * then this is a good indication the cpu is stuck
+ */
+ if (is_hardlockup(cpu)) {
unsigned int this_cpu = smp_processor_id();
unsigned long flags;

@@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)

per_cpu(watchdog_hardlockup_warned, cpu) = true;
} else {
+ watchdog_hardlockup_update_reset(cpu);
per_cpu(watchdog_hardlockup_warned, cpu) = false;
}
}

> > But honestly, I am not sure if this is worth the complexity.
> >
> >
> > Higher level look:
> > ------------------
> >
> > My understanding is that this patch has an effect only when
> > touch_nmi_watchdog() is called as frequently as
> > watchdog_hardlockup_check().
> >
> > The original code gives the system more time to recover after
> > a known stall (touch_nmi_watchdog() called).
> >
> > The new code is more eager to report a stall. It might be more prone
> > to report "false" positives.
> >
> > IMHO, the root of the problem is that touch_nmi_watchdog() is
> > called too frequently. And this patch is rather dancing around
> > then fixing it.
>
> I don't think it's really any more likely to report false positives
> after the bug you pointed out is fixed. The old watchdog was just too
> conservative. With Mayank's proposal I think calling
> touch_nmi_watchdog() should reset the watchdog the same amount as
> letting the hrtimer run once and that seems like a very reasonable
> interpretation.

Fair enough.

> > Alternative:
> > ------------
> >
> > An alternative solution might to detect and report when too many
> > watchdog_hardlockup_check() calls are ignored because of
> > touch_nmi_watchdog().
> >
> > It might help to find a mis-use of touch_nmi_watchdog(). The question
> > is what details should be reported in this case.
> >
> > It should be optional because touch_nmi_watchdog() is supposed
> > to hide "well-known" sinners after all.
>
> Hmmmm. I certainly support trying to reduce the number of places that
> call touch_nmi_watchdog(), but at the same time I don't think Mayank's
> patch is "dancing around" the problem. IMO considering the
> touch_nmi_watchdog() to be "pretend a timer interrupt fired" is the
> intuitive way one would think the call should work. The fact that the
> code gave an entire extra 10 seconds before the watchdog could be
> caught just feels like a bug that should be fixed.
>
> For the 8250 driver in particular, it looks like the
> touch_nmi_watchdog() was removed from serial8250_console_write() as
> part of nbcon, but then that got reverted. That would still leave two
> other touch_nmi_watchdog() calls in that driver...

Sigh, it seems that touch_nmi_watchdog() can't be removed easily.

Best Regards,
Petr