Re: [PATCH] watchdog/hardlockup: Avoid large stack frames in watchdog_hardlockup_check()

From: Doug Anderson
Date: Tue Aug 01 2023 - 11:42:13 EST


Hi,

On Tue, Aug 1, 2023 at 8:26 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 01-08-23 07:16:15, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Aug 1, 2023 at 5:58 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > On Mon 31-07-23 09:17:59, Douglas Anderson wrote:
> > > > After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to
> > > > watchdog_hardlockup_check()") we started storing a `struct cpumask` on
> > > > the stack in watchdog_hardlockup_check(). On systems with
> > > > CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That
> > > > triggers warnings with `CONFIG_FRAME_WARN` set to 1024.
> > > >
> > > > Instead of putting this `struct cpumask` on the stack, let's declare
> > > > it as `static`. This has the downside of taking up 1K of memory all
> > > > the time on systems with `CONFIG_NR_CPUS` to 8192, but on systems with
> > > > smaller `CONFIG_NR_CPUS` it's not much emory (with 128 CPUs it's only
> > > > 16 bytes of memory). Presumably anyone building a system with
> > > > `CONFIG_NR_CPUS=8192` can afford the extra 1K of memory.
> > > >
> > > > NOTE: as part of this change, we no longer check the return value of
> > > > trigger_single_cpu_backtrace(). While we could do this and only call
> > > > cpumask_clear_cpu() if trigger_single_cpu_backtrace() didn't fail,
> > > > that's probably not worth it. There's no reason to believe that
> > > > trigger_cpumask_backtrace() will succeed at backtracing the CPU when
> > > > trigger_single_cpu_backtrace() failed.
> > > >
> > > > Alternatives considered:
> > > > - Use kmalloc with GFP_ATOMIC to allocate. I decided against this
> > > > since relying on kmalloc when the system is hard locked up seems
> > > > like a bad idea.
> > > > - Change the arch_trigger_cpumask_backtrace() across all architectures
> > > > to take an extra parameter to get the needed behavior. This seems
> > > > like a lot of churn for a small savings.
> > > >
> > > > Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()")
> > > > Reported-by: kernel test robot <lkp@xxxxxxxxx>
> > > > Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@xxxxxxxxx
> > > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > > ---
> > > >
> > > > kernel/watchdog.c | 14 +++++++-------
> > > > 1 file changed, 7 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > > index be38276a365f..19db2357969a 100644
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > @@ -151,9 +151,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > > */
> > > > if (is_hardlockup(cpu)) {
> > > > unsigned int this_cpu = smp_processor_id();
> > > > - struct cpumask backtrace_mask;
> > > > -
> > > > - cpumask_copy(&backtrace_mask, cpu_online_mask);
> > > >
> > > > /* Only print hardlockups once. */
> > > > if (per_cpu(watchdog_hardlockup_warned, cpu))
> > > > @@ -167,10 +164,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > > show_regs(regs);
> > > > else
> > > > dump_stack();
> > > > - cpumask_clear_cpu(cpu, &backtrace_mask);
> > > > } else {
> > > > - if (trigger_single_cpu_backtrace(cpu))
> > > > - cpumask_clear_cpu(cpu, &backtrace_mask);
> > > > + trigger_single_cpu_backtrace(cpu);
> > > > }
> > > >
> > > > /*
> > > > @@ -178,8 +173,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > > * hardlockups generating interleaving traces
> > > > */
> > > > if (sysctl_hardlockup_all_cpu_backtrace &&
> > > > - !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped))
> > > > + !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) {
> > > > + static struct cpumask backtrace_mask;
> > > > +
> > > > + cpumask_copy(&backtrace_mask, cpu_online_mask);
> > > > + cpumask_clear_cpu(cpu, &backtrace_mask);
> > > > trigger_cpumask_backtrace(&backtrace_mask);
> > >
> > > This looks rather wasteful to just copy the cpumask over to
> > > backtrace_mask in nmi_trigger_cpumask_backtrace (which all but sparc
> > > arches do AFAICS).
> > >
> > > Would it be possible to use arch_trigger_cpumask_backtrace(cpu_online_mask, false)
> > > and special case cpu != this_cpu && sysctl_hardlockup_all_cpu_backtrace?
> >
> > So you're saying optimize the case where cpu == this_cpu and then have
> > a special case (where we still copy) for cpu != this_cpu? I can do
> > that if that's what people want, but (assuming I understand correctly)
> > that's making the wrong tradeoff. Specifically, this code runs one
> > time right as we're crashing and if it takes an extra millisecond to
> > run it's not a huge deal. It feels better to avoid the special case
> > and keep the code smaller.
> >
> > Let me know if I misunderstood.
>
> I meant something like this (untested but it should give an idea what I
> mean hopefully). Maybe it can be simplified even further. TBH I am not
> entirely sure why cpu == this_cpu needs this special handling.
> ---
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index be38276a365f..0eedac9f1019 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -151,9 +151,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> */
> if (is_hardlockup(cpu)) {
> unsigned int this_cpu = smp_processor_id();
> - struct cpumask backtrace_mask;
> -
> - cpumask_copy(&backtrace_mask, cpu_online_mask);
> + bool dump_all = false;
>
> /* Only print hardlockups once. */
> if (per_cpu(watchdog_hardlockup_warned, cpu))
> @@ -167,10 +165,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> show_regs(regs);
> else
> dump_stack();
> - cpumask_clear_cpu(cpu, &backtrace_mask);
> - } else {
> - if (trigger_single_cpu_backtrace(cpu))
> - cpumask_clear_cpu(cpu, &backtrace_mask);
> }
>
> /*
> @@ -179,7 +173,12 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> */
> if (sysctl_hardlockup_all_cpu_backtrace &&
> !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped))
> - trigger_cpumask_backtrace(&backtrace_mask);
> + dump_all = true;
> +
> + if (dump_all)
> + arch_trigger_cpumask_backtrace(cpu_online_mask, cpu != this_cpu);
> + else if (cpu != this_cpu)
> + trigger_single_cpu_backtrace(cpu);

Ah, I see what you mean. The one issue I have with your solution is
that the ordering of the stack crawls is less ideal in the "dump all"
case when cpu != this_cpu. We really want to see the stack crawl of
the locked up CPU first and _then_ see the stack crawls of other CPUs.
With your solution the locked up CPU will be interspersed with all the
others and will be harder to find in the output (you've got to match
it up with the "Watchdog detected hard LOCKUP on cpu N" message).
While that's probably not a huge deal, it's nicer to make the output
easy to understand for someone trying to parse it...

-Doug