Re: [PATCHv3 2/2] watchdog/softlockup: report the most frequent interrupts

From: Doug Anderson
Date: Fri Feb 02 2024 - 10:03:00 EST


Hi,

On Fri, Feb 2, 2024 at 6:22 AM Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx> wrote:
>
> > ...or maybe you don't need this "if" test at all since you're using
> > "need_record_irq_counts(STATS_HARDIRQ)" here. IMO that should be
> > pulled out here as well since it makes it more obvious...
> I agree with your this suggestion here. It is easier to understand:
>
> if (time_after_eq(now, period_ts + get_softlockup_thresh() / 5))
> set_potential_softlockup_hardirq();
>
> Please let me explain the criteria for the judgment here. Under normal
> circumstances, "softlockup_fn" will be woken up every "sample_period" to
> update "period_ts", and the "time_after_eq" I written will be false. If
> "period_ts" has not been updated after a "sample_period" has passed,
> then the "time_after_eq" will be true. And I suspect that in the
> subsequent few "sample_period", "period_ts" might also not be updated,
> which could indicate a potential softlockup. At this point, I use
> "need_record_irq_counts" to determine if this phenomenon is caused by an
> interrupt storm.
>
> To summarize, my condition to start counting interrupts is that
> "period_ts" has not been updated during "sample_period" AND the
> proportion of hardirq time during "sample_period" exceeds 50%.
>
> What do you think?

OK, sounds reasonable. Given that this is non-obvious, it would be
great if your patch included a comment explaining it. :-)