Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness

From: Petr Mladek

Date: Wed Mar 11 2026 - 10:13:17 EST


On Thu 2026-03-05 08:45:35, Doug Anderson wrote:
> Hi,
>
> On Thu, Mar 5, 2026 at 5:47 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> >
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
> > > {
> > > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > >
> > > - if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > > - return true;
> > > + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > + per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > + if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> >
> > This would return true for every check when missed >= 3.
> > As a result, the hardlockup would be reported every 4s.
> >
> > I would keep the 12s cadence and change this to:
> >
> > if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)
>
> I could be confused, but I don't think this is needed because we clear
> "hrtimer_interrupts_missed" to 0 any time we save the timer count.
> While I believe the "%" will functionally work, it seems harder to
> understand, at least to me.

My understanding is that we save the number of interrupts
and reset missed counter only when:

+ the number of interrupts is different (timer on the watched CPU fired)
+ the watchdog was touched (hiding delay)

=> it is just incremented when the timer was not called
(hardlockup scenario).

In particular, it is _not_ reset when we report the hardlockup.

Or do I miss anything.

Best Regards,
Petr