Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check

From: Petr Mladek

Date: Wed Mar 04 2026 - 09:47:49 EST

On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@xxxxxxxxxx>
>
> Currently, arch_touch_nmi_watchdog() causes an early return that
> skips updating hrtimer_interrupts_saved. This leads to stale
> comparisons and delayed lockup detection.
>
> Update the saved interrupt count before checking the touched flag
> to ensure detection timeliness.

IMHO, it is not that easy, see below.

So I am curious. Have you found this when debugging a particular
problem or just by reading the code, please?

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
>
> void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> {
> + bool is_hl;
> int hardlockup_all_cpu_backtrace;
> + /*
> + * Check for a hardlockup by making sure the CPU's timer
> + * interrupt is incrementing. The timer interrupt should have
> + * fired multiple times before we overflow'd. If it hasn't
> + * then this is a good indication the cpu is stuck
> + *
> + * Purposely check this _before_ checking watchdog_hardlockup_touched
> + * so we make sure we still update the saved value of the interrupts.
> + * Without that we'll take an extra round through this function before
> + * we can detect a lockup.
> + */
> +
> + is_hl = is_hardlockup(cpu);
>
> if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> per_cpu(watchdog_hardlockup_touched, cpu) = false;

Hmm, this does not look correct to me.

1. First, let's agree on the meaning of "watchdog_hardlockup_touched".

My understanding is that arch_touch_nmi_watchdog() is called by code
which might block interrupts (timers) for a long time and wants to
hide it.

Blocking interrupts for too long is _bad_. In the ideal word,
nobody should call arch_touch_nmi_watchdog() because we want
to know about all sinners.

In the real world, we allow to hide some sinners because
they might produce "false" positives, see touch_nmi_watchdog()
callers:

+ Most callers are printk() related.

We might argue whether it is a false positive or not.

The argument for "touching the watchdog" is that slow serial
consoles might block IRQs for a long time. But they work as
expected and can't do better.

Also the stall is kind of expected in this case. We could
confuse users and/or hide the original problem if the stall
was reported.

Note that there is a bigger problem with the legacy console
drivers. printk() tries to emit them immediately. And the
current console_lock() owner become responsible for flushing
new messages added by other CPUs in parallel.

It is better with the new NBCON console drivers which are
offloaded to a kthread. Here, printk() tries to flush them directly
only when called in an emergency mode (WARN, stall report)
or in panic().

+ There are some other callers, for example, multi_stop_cpu(),
or hv_do_rep_hypercall_ex(). IMHO, they create stalls on
purpose.

2. Let's look at is_hardlockup() in detail:

static bool is_hardlockup(unsigned int cpu)
{
int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));

if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
per_cpu(hrtimer_interrupts_missed, cpu)++;
if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
return true;

return false;
}

per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
per_cpu(hrtimer_interrupts_missed, cpu) = 0;

return false;
}

If we call it when the watchdog was touched then
(per_cpu(hrtimer_interrupts_saved, cpu) == hrint)

=> per_cpu(hrtimer_interrupts_missed, cpu)++;

is called even when watchdog was touched.

As a result, we might report stall which should have been hidden,
for example:

CPU0 CPU1

<NMI>
watchdog_hardlockup_check() # passes
is_hardlockup() # no
hr_int_saved = hr_int;
hr_int_missed = 0;
</NMI>

local_irq_save()
printk()
console_trylock()
console_unlock()
console_flush_all()

touch_nmi_watchdog()

// Other CPUs print many messages,
// e.g. during boot when initializing a lot of HW
for (i=0; i<1000; i++) do
printk();

<NMI>
watchdog_hardlockup_check()
is_hardlockup() # yes
hr_int_missed++ # 1

# skip because touched
</NMI>

touch_nmi_watchdog()

<NMI>
watchdog_hardlockup_check()
is_hardlockup() # yes
hr_int_missed++ # 2

# skip because touched
</NMI>

... repeat many times ...

local_irq_restore()

# this might normally trigger handling of pending IRQs
# including the timers. But IMHO, it can be offloaded
# to a kthread (at least on RT)

<NMI>
watchdog_hardlockup_check()
is_hardlockup() # yes
hr_int_missed++ # might be already 3, 4,...

Report hardlockup even when all the "hr_int_missed"
values should have been ignored because of
touch_watchdog.

</NMI>

A solution might be clearing "hrtimer_interrupts_missed"
when the watchdog was touched.

But honestly, I am not sure if this is worth the complexity.

Higher level look:
------------------

My understanding is that this patch has an effect only when
touch_nmi_watchdog() is called as frequently as
watchdog_hardlockup_check().

The original code gives the system more time to recover after
a known stall (touch_nmi_watchdog() called).

The new code is more eager to report a stall. It might be more prone
to report "false" positives.

IMHO, the root of the problem is that touch_nmi_watchdog() is
called too frequently. And this patch is rather dancing around
then fixing it.

Alternative:
------------

An alternative solution might to detect and report when too many
watchdog_hardlockup_check() calls are ignored because of
touch_nmi_watchdog().

It might help to find a mis-use of touch_nmi_watchdog(). The question
is what details should be reported in this case.

It should be optional because touch_nmi_watchdog() is supposed
to hide "well-known" sinners after all.

> @@ -195,13 +209,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>
> hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> 1 : sysctl_hardlockup_all_cpu_backtrace;
> - /*
> - * Check for a hardlockup by making sure the CPU's timer
> - * interrupt is incrementing. The timer interrupt should have
> - * fired multiple times before we overflow'd. If it hasn't
> - * then this is a good indication the cpu is stuck
> - */
> - if (is_hardlockup(cpu)) {
> +
> + if (is_hl) {
> unsigned int this_cpu = smp_processor_id();
> unsigned long flags;
>

Best Regards,
Petr