Re: [PATCH] watchdog/hardlockup: Avoid large stack frames in watchdog_hardlockup_check()

From: Doug Anderson
Date: Wed Aug 02 2023 - 10:12:51 EST


Hi,

On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 01-08-23 08:41:49, Doug Anderson wrote:
> [...]
> > Ah, I see what you mean. The one issue I have with your solution is
> > that the ordering of the stack crawls is less ideal in the "dump all"
> > case when cpu != this_cpu. We really want to see the stack crawl of
> > the locked up CPU first and _then_ see the stack crawls of other CPUs.
> > With your solution the locked up CPU will be interspersed with all the
> > others and will be harder to find in the output (you've got to match
> > it up with the "Watchdog detected hard LOCKUP on cpu N" message).
> > While that's probably not a huge deal, it's nicer to make the output
> > easy to understand for someone trying to parse it...
>
> Is it worth to waste memory for this arguably nicer output? Identifying
> the stack of the locked up CPU is trivial.

I guess it's debatable, but as someone who has spent time staring at
trawling through reports generated like this, I'd say "yes", it's
super helpful in understanding the problem to have the hung CPU first.
Putting the memory usage in perspective:

* On a kernel built with a more normal number of max CPUs, like 256,
this is only a use of 32 bytes of memory. That's 8 CPU instructions
worth of memory.

* Even on a system with the largest number of max CPUs we currently
allow (8192), this is only a use of 1024 bytes of memory. Sure, that's
a big chunk, but this is also something on our largest systems.

In any case, how about this. We only need the memory allocated if
`sysctl_hardlockup_all_cpu_backtrace` is non-zero. I can hook in
whenever that's changed (should be just at bootup) and then kmalloc
memory then. This really limits the extra memory to just cases when
it's useful. Presumably on systems that are designed to run massively
SMP they wouldn't want to turn this knob on anyway since it would spew
far too much data. If you took a kernel compiled for max SMP, ran it
on a machine with only a few cores, and wanted this feature turned on
then at most you'd be chewing up 1K. In the average case this would
chew up some extra memory (extra CPU instructions to implement the
function take code space, extra overhead around kmalloc) but it would
avoid the 1K chunk in most cases.

Does that sound reasonable?

I guess my last alternative would be to keep the special case of
tracing the hung CPU first (this is the most important part IMO) and
then accept the double trace, AKA:

/* Try to avoid re-dumping the stack on the hung CPU if possible */
if (cpu == this_cpu))
trigger_allbutself_cpu_backtrace();
else
trigger_all_cpu_backtrace();

-Doug