Remove the "perf" hard lock up detector (watchdog) from the kernel?
From: Ian Rogers
Date: Mon Mar 17 2025 - 17:26:21 EST
Hi,
The kernel tree has two hard lockup detectors. The perf one uses a
perf counter to generate NMI interrupts and detect a lack of forward
progress, whereas the buddy approach uses the soft lockup hrtimer to
check the next CPU is progressing. Doug Anderson
<dianders@xxxxxxxxxxxx> recently questioned:
https://lore.kernel.org/all/CAD=FV=WfB6inJPuwfhbw4mtFBYpr+3ot2J+SJAZ3pT3t4fW7cw@xxxxxxxxxxxxxx/
...but I'd also have to ask: is there a reason you're using the "perf"
hard-lockup detector instead of the buddy one? In my mind, the "buddy"
watchdog is better in almost all ways (I believe it's lower power,
doesn't waste a "perf" controller, and doesn't suffer from frequency
issues). It's even crossed my mind whether the "perf" lockup detector
should be deprecated. ;-)
In the perf tool there are warnings associated with the NMI watchdog.
The metric code also has a flag on metrics where events aren't grouped
when the NMI watchdog is enabled. For example:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json?h=perf-tools-next#n1916
The warning and breaking of groups is currently inaccurate for the
buddy hard lockup detector as /proc/sys/kernel/nmi_watchdog is still
present to enable or disable the buddy detector. That is the perf tool
is currently warning and breaking event groups stating the NMI
watchdog is a problem but the kernel is configured to use the buddy
watchdog.
I'm unaware of a way to determine if the buddy or "perf" counter based
approach is in use and to correct perf's behavior. A patch adding such
an ability (say a new file in /proc/sys/kernel), and perhaps new
abilities to switch watchdog at runtime, seem less desirable than just
deleting the "perf" counter based hard lock up detector. The perf tool
could make the NMI warnings and breaking of event groups conditional
on the running kernel version then.
Are there objections to just deleting the "perf" hard lock up detector
(watchdog) from the kernel tree? Are there reasons to keep it around
but just not default?
Thanks,
Ian