[PATCHv5 0/3] *** Detect interrupt storm in softlockup ***

From: Bitao Hu
Date: Tue Feb 06 2024 - 04:59:26 EST


Hi, guys.
I have implemented a low-overhead method for detecting interrupt storm
in softlockup. Please review it, all comments are welcome.

Changes from v4 to v5:

- Rearranging variable placement to make code look neater.

Changes from v3 to v4:

- Renaming some variable and function names to make the code logic
more readable.

- Change the code location to avoid predeclaring.

- Just swap rather than a double loop in tabulate_irq_count.

- Since nr_irqs has the potential to grow at runtime, bounds-check
logic has been implemented.

- Add SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob.

Changes from v2 to v3:

- From Liu Song, using enum instead of macro for cpu_stats, shortening
the name 'idx_to_stat' to 'stats', adding 'get_16bit_precesion' instead
of using right shift operations, and using 'struct irq_counts'.

- From kernel robot test, using '__this_cpu_read' and '__this_cpu_write'
instead of accessing to an per-cpu array directly, in order to avoid
this warning.
'sparse: incorrect type in initializer (different modifiers)'

Changes from v1 to v2:

- From Douglas, optimize the memory of cpustats. With the maximum number
of CPUs, that's now this.
2 * 8192 * 4 + 1 * 8192 * 5 * 4 + 1 * 8192 = 237,568 bytes.

- From Liu Song, refactor the code format and add necessary comments.

- From Douglas, use interrupt counts instead of interrupt time to
determine the cause of softlockup.

- Remove the cmdline parameter added in PATCHv1.

Bitao Hu (3):
watchdog/softlockup: low-overhead detection of interrupt
watchdog/softlockup: report the most frequent interrupts
watchdog/softlockup: add SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob

kernel/watchdog.c | 245 ++++++++++++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 13 +++
2 files changed, 258 insertions(+)

--
2.37.1 (Apple Git-137.1)