Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
From: Doug Anderson
Date: Thu Mar 12 2026 - 17:04:03 EST
Hi,
On Wed, Mar 11, 2026 at 7:07 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Thu 2026-03-05 08:45:35, Doug Anderson wrote:
> > Hi,
> >
> > On Thu, Mar 5, 2026 at 5:47 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> > >
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
> > > > {
> > > > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > >
> > > > - if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > > > - return true;
> > > > + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > > + per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > + if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > >
> > > This would return true for every check when missed >= 3.
> > > As a result, the hardlockup would be reported every 4s.
> > >
> > > I would keep the 12s cadence and change this to:
> > >
> > > if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)
> >
> > I could be confused, but I don't think this is needed because we clear
> > "hrtimer_interrupts_missed" to 0 any time we save the timer count.
> > While I believe the "%" will functionally work, it seems harder to
> > understand, at least to me.
>
> My understanding is that we save the number of interrupts
> and reset missed counter only when:
>
> + the number of interrupts is different (timer on the watched CPU fired)
> + the watchdog was touched (hiding delay)
>
> => it is just incremented when the timer was not called
> (hardlockup scenario).
>
> In particular, it is _not_ reset when we report the hardlockup.
>
> Or do I miss anything.
Ah, I wasn't thinking about the "non-panic" case. You are correct, we
need the "%" syntax in order to handle that case.
-Doug