Re: [PATCH] hung_task: Skip scan on idle systems

From: Lance Yang

Date: Mon Jan 26 2026 - 00:23:19 EST

Hi Aaron,

Keep one patch or series under review at a time, especially in the
same subsystem ...

Maintainers/Reviewers have limited bandwidth and can focus better
on one thing at a time.

Please, be patient! Just wait for it to be merged or rejected before
sending the next.

On 2026/1/26 11:45, Aaron Tomlin wrote:

At present, the hung task detector behaves in an unoptimised manner: it
wakes up periodically (every check_interval_secs, defaulting to 120
seconds) and performs an O(N) scan of the entire process list,
regardless of the system's actual state. On idle embedded devices,
virtual machines, or large servers with no activity, this behaviour
unnecessarily consumes CPU cycles and memory bandwidth, hindering
power-saving states.

To rectify this, this patch introduces an adaptive "green" polling
mechanism. The detector will now verify whether the system is
effectively idle before committing to a full process scan.

To implement this, we utilise the standard get_avenrun() API to verify
the global system load. Tasks in the TASK_UNINTERRUPTIBLE (D) state
explicitly contribute to the system load average; consequently, if the
1-minute load average is zero, we can confidently infer that no tasks
are currently hung, allowing us to bypass the expensive process scan.

Crucially, we invoke get_avenrun(load, 0, 0) with both the offset and
shift parameters set to zero. This configuration is deliberate and
necessary for safety:

1. Zero Offset: Prevents the application of any artificial
rounding bias usually intended for human-readable display.

2. Zero Shift: Retrieves the raw fixed-point value (where 1.0
load = 2048) rather than shifting it down to an integer.

This ensures maximum sensitivity: even a microscopic fractional load
(e.g., a single task entering D state momentarily) will register as a
non-zero raw value. This guarantees that we never encounter a false
negative where a valid hung task is ignored due to integer truncation or
rounding errors.

This heuristic significantly minimises the detector's footprint on
healthy systems whilst maintaining robust reliability for genuine hangs.

Signed-off-by: Aaron Tomlin <atomlin@xxxxxxxxxxx>
---
kernel/hung_task.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d2254c91450b..7b9f5c1bd35e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -17,6 +17,7 @@
#include <linux/export.h>
#include <linux/panic_notifier.h>
#include <linux/sysctl.h>
+#include <linux/sched/loadavg.h>
#include <linux/suspend.h>
#include <linux/utsname.h>
#include <linux/sched/signal.h>
@@ -503,6 +504,7 @@ static int watchdog(void *dummy)
for ( ; ; ) {
unsigned long timeout = sysctl_hung_task_timeout_secs;
unsigned long interval = sysctl_hung_task_check_interval_secs;
+ unsigned long load[3];
long t;
if (interval == 0)
@@ -511,8 +513,12 @@ static int watchdog(void *dummy)
t = hung_timeout_jiffies(hung_last_checked, interval);
if (t <= 0) {
if (!atomic_xchg(&reset_hung_task, 0) &&
- !hung_detector_suspended)
- check_hung_uninterruptible_tasks(timeout);
+ !hung_detector_suspended) {
+ /* Check 1-min load to detect idle system */
+ get_avenrun(load, 0, 0);
+ if (load[0] > 0)
+ check_hung_uninterruptible_tasks(timeout);

The optimization is not worth the trouble.

I don't think the assumption that "load[0] == 0 means no hung tasks" is
100% correct.

So that would miss actual hung tasks - a false negative, which is worse
than the "wasted scan" you're trying to avoid.

Also, I don't *really* care about optimizing something that runs once
every 120 seconds :)

Nacked-by: Lance Yang <lance.yang@xxxxxxxxx>