Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks

From: Lance Yang

Date: Sun Sep 28 2025 - 02:56:09 EST


Hey Li,

On 2025/9/28 13:31, lirongqing wrote:
From: Li RongQing <lirongqing@xxxxxxxxx>

Currently, when hung_task_panic is enabled, kernel will panic immediately
upon detecting the first hung task. However, some hung tasks are transient
and the system can recover fully, while others are unrecoverable and
trigger consecutive hung task reports, and a panic is expected.

This commit adds a new sysctl parameter hung_task_count_to_panic to allows
specifying the number of consecutive hung tasks that must be detected
before triggering a kernel panic. This provides finer control for
environments where transient hangs maybe happen but persistent hangs should
still be fatal.

Acked-by: Lance Yang <lance.yang@xxxxxxxxx>
Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
---

It's working as expect. So:
Tested-by: Lance Yang <lance.yang@xxxxxxxxx>

But on second thought: regarding this new sysctl parameter, I was wondering
if a name like max_hung_task_count_to_panic might be a bit more explicit,
just to follow the convention from max_rcu_stall_to_panic.

No strong opinion on this, though :)

Cheers,
Lance

Diff with v1: change documentation as Lance suggested

Documentation/admin-guide/sysctl/kernel.rst | 8 ++++++++
kernel/hung_task.c | 14 +++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 8b49eab..98b47a7 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -405,6 +405,14 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
1 Panic immediately.
= =================================================
+hung_task_count_to_panic
+=====================
+
+When set to a non-zero value, a kernel panic will be triggered if the
+number of detected hung tasks reaches this value.
+
+Note that setting hung_task_panic=1 will still cause an immediate panic
+on the first hung task.
hung_task_check_count
=====================
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 8708a12..87a6421 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -83,6 +83,8 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
static unsigned int __read_mostly sysctl_hung_task_panic =
IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
+static unsigned int __read_mostly sysctl_hung_task_count_to_panic;
+
static int
hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
{
@@ -219,7 +221,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
trace_sched_process_hang(t);
- if (sysctl_hung_task_panic) {
+ if (sysctl_hung_task_panic ||
+ (sysctl_hung_task_count_to_panic &&
+ (sysctl_hung_task_detect_count >= sysctl_hung_task_count_to_panic))) {
console_verbose();
hung_task_show_lock = true;
hung_task_call_panic = true;
@@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] = {
.extra2 = SYSCTL_ONE,
},
{
+ .procname = "hung_task_count_to_panic",
+ .data = &sysctl_hung_task_count_to_panic,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ },
+ {
.procname = "hung_task_check_count",
.data = &sysctl_hung_task_check_count,
.maxlen = sizeof(int),