Re: [PATCH] hrtimer: CPU and entry_time is added to a warning message in hrtimer_interrupt()

From: Thomas Gleixner
Date: Thu Dec 01 2022 - 16:31:16 EST


Rei!

On Fri, Jun 24 2022 at 16:00, Rei Yamamoto wrote:
> A warning message in hrtimer_interrupt() is output up to 5 times
> by default, and CPU and entry_time are also shown.

This describes to some extent _what_ the patch is doing, but not the
why.

> These changes are helpful that the function spending a lot of time is clear
> by using ftrace:

That's a constructed case. There are multiple reasons why this can
happen, not just because a single hrtimer callback misbehaves.

> @@ -2038,6 +2039,15 @@ static struct ctl_table kern_table[] = {
> .extra1 = SYSCTL_ONE,
> .extra2 = SYSCTL_INT_MAX,
> },
> +#endif
> +#ifdef CONFIG_HIGH_RES_TIMERS
> + {
> + .procname = "hrtimer_interrupt_warnings",
> + .data = &sysctl_hrtimer_interrupt_warnings,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },

So this adds a new sysctl, but the changelog does not tell anything
about it. Aside of the dubious value of this sysctl, this lacks the
required documentation for new sysctls.

> + /*
> + * If a message is output many times, the delayed funciton
> + * may be identified by resetting sysctl_hrtimer_interrupt_warnings
> + * and enabling ftrace.

What has the reset of sysctl_hrtimer_interrupt_warnings to do with
ftrace and how is that reset helpful to identify the root cause?

Also repeating the printk 5 times does not add any value at all. The
runaway detection already has logic to supress spurious events and if
the problem persists then it can be observed by ftrace without any of
these changes.

I assume - because you did not tell so - that you try to have a
correlation between ftrace and dmesg via the entry timestamp output,
right?

That's just a half thought out debug bandaid, really.

You can provide a way better mechanism by adding a tracepoint right at
the pr_warn_once(), which emits information for correlation right into
the trace.

That allows you to stop the trace once the tracepoint is emitted instead
of having to do all of this including the correlation manually.

Hmm?

Thanks,

tglx