Re: [PATCH v3 2/2] hung_task: Enable runtime reset of hung_task_detect_count

From: Lance Yang

Date: Wed Dec 17 2025 - 08:44:16 EST

On 2025/12/17 21:09, Petr Mladek wrote:

On Wed 2025-12-17 15:31:25, Lance Yang wrote:

On 2025/12/16 11:00, Aaron Tomlin wrote:

Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.

Writing any value to this file atomically resets the counter of detected
hung tasks to zero. This grants system administrators the ability to clear
the cumulative diagnostic history after resolving an incident, simplifying
monitoring without requiring a system restart.

--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -418,7 +418,7 @@ hung_task_detect_count
======================
Indicates the total number of tasks that have been detected as hung since
-the system boot.
+the system boot. The counter can be reset to zero when written to.
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 5902573200c0..01ce46a107b0 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -375,6 +375,31 @@ static long hung_timeout_jiffies(unsigned long last_checked,
}
#ifdef CONFIG_SYSCTL
+
+/**
+ * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
+ * @table: Pointer to the struct ctl_table definition for this proc entry
+ * @write: Flag indicating the operation
+ * @buffer: User space buffer for data transfer
+ * @lenp: Pointer to the length of the data being transferred
+ * @ppos: Pointer to the current file offset
+ *
+ * This handler is used for reading the current hung task detection count
+ * and for resetting it to zero when a write operation is performed.
+ * Returns 0 on success or a negative error code on failure.
+ */
+static int proc_dohung_task_detect_count(const struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ if (!write)
+ return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
+
+ WRITE_ONCE(sysctl_hung_task_detect_count, 0);

The reset uses WRITE_ONCE() but the increment in check_hung_task()
is plain ++, it's just a stat cunter so fine though ;)

I was about to wave this away as well. But I am afraid that
it is even more complicated.

Good catch!

The counter is used to decide how many hung tasks were found in
by a single check, see:

static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
[...]
unsigned long prev_detect_count = sysctl_hung_task_detect_count;
[...]
for_each_process_thread(g, t) {
[...]
check_hung_task(t, timeout, prev_detect_count);
}
[...]
if (!(sysctl_hung_task_detect_count - prev_detect_count))
return;

if (need_warning || hung_task_call_panic) {
si_mask |= SYS_INFO_LOCKS;

if (sysctl_hung_task_all_cpu_backtrace)
si_mask |= SYS_INFO_ALL_BT;
}

sys_info(si_mask);

if (hung_task_call_panic)
panic("hung_task: blocked tasks");
}

Any race might cause false positives and even panic() !!!
And the race window is rather big (checking all processes).

Right, I completely missed the dependency on prev_detect_count.
Messing with that counter while the loop is running is really
dangerous ...

This race can't be prevented by an atomic read/write.
IMHO, the only solution would be to add some locking,
e.g. mutex and take it around the entire
check_hung_uninterruptible_tasks().

On one hand, it might work. The code is called from
a task context...

But adding a lock into a lockup-detector code triggers
some warning bells in my head.

Honestly, I am not sure if it is worth it. I understand
the motivation for the reset. But IMHO, even more important
is to make sure that the watchdog works as expected.

Given that adding a lock to the detector path is undesirable,
the risk clearly outweighs the benefit :)

Thanks for saving me from this pitfall, Petr!