Re: [PATCHv5 2/3] watchdog/softlockup: report the most frequent interrupts

From: Doug Anderson
Date: Tue Feb 06 2024 - 16:42:41 EST


Hi,

On Tue, Feb 6, 2024 at 1:59 AM Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx> wrote:
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 71d5b6dfa358..26dc1ad86276 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -18,6 +18,9 @@
> #include <linux/init.h>
> #include <linux/kernel_stat.h>
> #include <linux/math64.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
> +#include <linux/bitops.h>

These are still not sorted alphabetically. "irq.h" and "irqdesc.h"
should go between "init.h" and "kernel_stat.h". "bitops.h" is trickier
because the existing headers are not quite sorted. Probably the best
would be to fully sort them. They should end up like this:

#include <linux/bitops.h>
#include <linux/cpu.h>
#include <linux/init.h>
#include <linux/irq.h>
#include <linux/irqdesc.h>
#include <linux/kernel_stat.h>
#include <linux/kvm_para.h>
#include <linux/math64.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/nmi.h>
#include <linux/stop_machine.h>
#include <linux/sysctl.h>
#include <linux/tick.h>

#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/isolation.h>

#include <asm/irq_regs.h>


> +static void start_counting_irqs(void)
> +{
> + int i;
> + struct irq_desc *desc;
> + u32 *counts = __this_cpu_read(hardirq_counts);
> + int cpu = smp_processor_id();
> +
> + if (!test_bit(cpu, softlockup_hardirq_cpus)) {

I don't think you need "softlockup_hardirq_cpus", do you? Just read
"actual_nr_irqs" and see if it's non-zero? ...or read "hardirq_counts"
and see if it's non-NULL?


> + /*
> + * nr_irqs has the potential to grow at runtime. We should read
> + * it and store locally to avoid array out-of-bounds access.
> + */
> + __this_cpu_write(actual_nr_irqs, nr_irqs);

nit: IMO store nr_irqs in a local variable to avoid all of the
"__this_cpu_read" calls everywhere. Then just write it once from your
local variable.


> + counts = kmalloc_array(__this_cpu_read(actual_nr_irqs),
> + sizeof(u32),
> + GFP_ATOMIC);

should use "kcalloc()" so the array is zeroed. That way if the set of
non-NULL "desc"s changes between calls you don't end up reading
uninitialized memory.


> +static void stop_counting_irqs(void)
> +{
> + u32 *counts = __this_cpu_read(hardirq_counts);
> + int cpu = smp_processor_id();
> +
> + if (test_bit(cpu, softlockup_hardirq_cpus)) {
> + kfree(counts);
> + counts = NULL;
> + __this_cpu_write(hardirq_counts, counts);

nit: don't really need to set the local "counts" to NULL. Just:

__this_cpu_write(hardirq_counts, NULL);

..and actually if you take my advice above and get rid of
"softlockup_hardirq_cpus" then this function just becomes:

kfree(__this_cpu_read(hardirq_counts));
__this_cpu_write(hardirq_counts, NULL);

Since kfree() handles when you pass it NULL...


> +static void print_irq_counts(void)
> +{
> + int i;
> + struct irq_desc *desc;
> + u32 counts_diff;
> + u32 *counts = __this_cpu_read(hardirq_counts);
> + int cpu = smp_processor_id();
> + struct irq_counts irq_counts_sorted[NUM_HARDIRQ_REPORT] = {
> + {-1, 0}, {-1, 0}, {-1, 0}, {-1, 0},
> + };
> +
> + if (test_bit(cpu, softlockup_hardirq_cpus)) {
> + for_each_irq_desc(i, desc) {
> + if (!desc)
> + continue;

The "if" test above isn't needed. The "for_each_irq_desc()" macro
already checks for NULL.



> + /*
> + * We need to bounds-check in case someone on a different CPU
> + * expanded nr_irqs.
> + */
> + if (i < __this_cpu_read(actual_nr_irqs))
> + counts_diff = desc->kstat_irqs ?
> + *this_cpu_ptr(desc->kstat_irqs) - counts[i] : 0;
> + else
> + counts_diff = desc->kstat_irqs ?
> + *this_cpu_ptr(desc->kstat_irqs) : 0;

Why do you need to test "kstat_irqs" for 0? Also, ideally don't
duplicate the math. In other words, I'd expect this (untested):

if (i < __this_cpu_read(actual_nr_irqs))
count = counts[i];
else
count = 0;
counts_diff = *this_cpu_ptr(desc->kstat_irqs) - count;

I guess I'd also put "__this_cpu_read(actual_nr_irqs)" in a local
variable like you do with counts...