Re: [PATCHv11 4/4] watchdog/softlockup: report the most frequent interrupts

From: Doug Anderson
Date: Wed Feb 28 2024 - 17:45:40 EST


Hi,

On Tue, Feb 27, 2024 at 11:22 PM Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx> wrote:
>
> When the watchdog determines that the current soft lockup is due
> to an interrupt storm based on CPU utilization, reporting the
> most frequent interrupts could be good enough for further
> troubleshooting.
>
> Below is an example of interrupt storm. The call tree does not
> provide useful information, but we can analyze which interrupt
> caused the soft lockup by comparing the counts of interrupts.
>
> [ 638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0]
> [ 638.870825] CPU#9 Utilization every 4s during lockup:
> [ 638.871194] #1: 0% system, 0% softirq, 100% hardirq, 0% idle
> [ 638.871652] #2: 0% system, 0% softirq, 100% hardirq, 0% idle
> [ 638.872107] #3: 0% system, 0% softirq, 100% hardirq, 0% idle
> [ 638.872563] #4: 0% system, 0% softirq, 100% hardirq, 0% idle
> [ 638.873018] #5: 0% system, 0% softirq, 100% hardirq, 0% idle
> [ 638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
> [ 638.873994] #1: 330945 irq#7
> [ 638.874236] #2: 31 irq#82
> [ 638.874493] #3: 10 irq#10
> [ 638.874744] #4: 2 irq#89
> [ 638.874992] #5: 1 irq#102
> ...
> [ 638.875313] Call trace:
> [ 638.875315] __do_softirq+0xa8/0x364
>
> Signed-off-by: Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx>
> Reviewed-by: Liu Song <liusong@xxxxxxxxxxxxxxxxx>
> ---
> kernel/watchdog.c | 115 ++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 111 insertions(+), 4 deletions(-)

Reviewed-by: Douglas Anderson <dianders@xxxxxxxxxxxx>