Re: [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count

From: Lance Yang

Date: Wed Jan 14 2026 - 22:06:25 EST

On 2026/1/15 10:32, Aaron Tomlin wrote:

Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot. In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset
once an incident has been resolved. Furthermore, the previous
implementation relied upon implicit ordering, which could not strictly
guarantee that diagnostic metadata published by one CPU was visible to
the panic logic on another.

This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
has been updated to validate this input and atomically reset the
counter.

The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace. The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:

1. Prevention of Load-Store Reordering via Acquire Semantics By
utilising atomic_long_read_acquire() to snapshot the counter
before initiating the task traversal, we establish a strict
memory barrier. This prevents the compiler or hardware from
reordering the initial load to a point later in the scan. Without
this "acquire" barrier, a delayed load could potentially read a
"0" value resulting from a userspace reset that occurred
mid-scan. This would lead to the subsequent cmpxchg succeeding
erroneously, thereby overwriting the user's reset with stale
increment data.

2. Atomicity of the "Commit" Phase via Release Semantics The
atomic_long_cmpxchg_release() serves as the transaction's commit
point. The "release" barrier ensures that all diagnostic
recordings and task-state observations made during the scan are
globally visible before the counter is incremented.

3. Race Condition Resolution This pairing effectively detects any
"out-of-band" reset of the counter. If
sysctl_hung_task_detect_count is modified via the procfs
interface during the scan, the final cmpxchg will detect the
discrepancy between the current value and the "acquire" snapshot.
Consequently, the update will fail, ensuring that a reset command
from the administrator is prioritised over a scan that may have
been invalidated by that very reset.

Signed-off-by: Aaron Tomlin <atomlin@xxxxxxxxxxx>
---
Documentation/admin-guide/sysctl/kernel.rst | 3 +-
kernel/hung_task.c | 109 +++++++++++++-------
2 files changed, 75 insertions(+), 37 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 239da22c4e28..68da4235225a 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -418,7 +418,8 @@ hung_task_detect_count
======================
Indicates the total number of tasks that have been detected as hung since
-the system boot.
+the system boot or since the counter was reset. The counter is zeroed when
+a value of 0 is written.
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index b5ad7a755eb5..2eb9c861bdcc 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -224,24 +224,43 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
}
#endif
-static void check_hung_task(struct task_struct *t, unsigned long timeout,
- unsigned long prev_detect_count)
+/**
+ * hung_task_diagnostics - Print structured diagnostic info for a hung task.
+ * @t: Pointer to the detected hung task.
+ *
+ * This function consolidates the printing of core diagnostic information
+ * for a task found to be blocked.
+ */
+static inline void hung_task_diagnostics(struct task_struct *t)
{
- unsigned long total_hung_task, cur_detect_count;
-
- if (!task_is_hung(t, timeout))
- return;
-
- /*
- * This counter tracks the total number of tasks detected as hung
- * since boot.
- */
- cur_detect_count = atomic_long_inc_return_relaxed(&sysctl_hung_task_detect_count);
- total_hung_task = cur_detect_count - prev_detect_count;
+ unsigned long blocked_secs = (jiffies - t->last_switch_time) / HZ;
+
+ pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
+ t->comm, t->pid, blocked_secs);
+ pr_err(" %s %s %.*s\n",
+ print_tainted(), init_utsname()->release,
+ (int)strcspn(init_utsname()->version, " "),
+ init_utsname()->version);
+ if (t->flags & PF_POSTCOREDUMP)
+ pr_err(" Blocked by coredump.\n");
+ pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
+}

I see hung_task_diagnostics() is still in this patch. I thought
we'd concluded that[1] the refactoring wasn't really necessary for a
single-use block?

[1] https://lore.kernel.org/all/noze3vhqjbsuulvvoaw4h5yeinggpwfslrit5vsd2dllfo4ath@qgmp22hoibgn/