Re: [PATCHv5 3/3] watchdog/softlockup: add SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob

From: Doug Anderson
Date: Tue Feb 06 2024 - 16:43:07 EST


Hi,

On Tue, Feb 6, 2024 at 1:59 AM Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx> wrote:
>
> The interrupt storm detection mechanism we implemented requires a
> considerable amount of global storage space when configured for
> the maximum number of CPUs.
> Therefore, adding a SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that
> defaults to "yes" if the max number of CPUs is <= 128.
>
> Signed-off-by: Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx>
> ---
> kernel/watchdog.c | 2 +-
> lib/Kconfig.debug | 13 +++++++++++++
> 2 files changed, 14 insertions(+), 1 deletion(-)

IMO this should be squashed into patch #1, though I won't insist.


> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 26dc1ad86276..1595e4a94774 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -338,7 +338,7 @@ __setup("watchdog_thresh=", watchdog_thresh_setup);
>
> static void __lockup_detector_cleanup(void);
>
> -#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR_INTR_STORM
> #define NUM_STATS_GROUPS 5
> #define NUM_STATS_PER_GROUP 4
> enum stats_per_group {
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 975a07f9f1cc..74002ba7c42d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1029,6 +1029,19 @@ config SOFTLOCKUP_DETECTOR
> chance to run. The current stack trace is displayed upon
> detection and the system will stay locked up.
>
> +config SOFTLOCKUP_DETECTOR_INTR_STORM
> + bool "Detect Interrupt Storm in Soft Lockups"
> + depends on SOFTLOCKUP_DETECTOR && IRQ_TIME_ACCOUNTING
> + default y if NR_CPUS <= 128
> + help
> + Say Y here to enable the kernel to detect interrupt storm
> + during "soft lockups".
> +
> + "soft lockups" can be caused by a variety of reasons. If one is caused by
> + an interrupt storm, then the storming interrupts will not be on the
> + callstack. To detect this case, it is necessary to report the CPU stats
> + and the interrupt counts during the "soft lockups".

It's probably not terribly important, but I notice that the other help
text in this file is generally wrapped to 80 columns. Even though the
kernel has relaxed the 80 column rule a bit, it still feels like this
could easily be wrapped to 80 columns without sacrificing any
readability.

In any case:

Reviewed-by: Douglas Anderson <dianders@xxxxxxxxxxxx>