Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung

From: Lance Yang

Date: Tue Nov 18 2025 - 12:57:45 EST

On 2025/11/18 23:20, Petr Mladek wrote:

On Mon 2025-11-17 09:53:52, Andrew Morton wrote:

On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx> wrote:

if (need_warning || hung_task_call_panic) {
si_mask |= SYS_INFO_LOCKS;

Looks good to me now! I assume v3 would be expected, can you
post a new version?

Andrew has taken the patchset to -mm tree.

Andrew, which way do you prefer? I send a v3 patch for hung-task or you
pickup the fixup patch and squash it into the orginal 0002 patch?

Anyway, I make a squshed version v3 patch below.

I prefer little fixup patches, generally. So people can see what
changed and don't feel they should re-review everything.

I queued the below, thanks.

From: Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx>
Subject: hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
Date: Wed, 5 Nov 2025 19:30:36 +0800

maintain consistecy established behavior, per Lance and Petr

Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
Suggested-by: Petr Mladek <pmladek@xxxxxxxx>
Signed-off-by: Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx>
Cc: Jonathan Corbet <corbet@xxxxxxx>
Cc: Lance Yang <ioworker0@xxxxxxxxx>
Cc: "Paul E . McKenney" <paulmck@xxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

Thanks a lot for catching and fixing the regression caused
by this patchset. The patch looks good.

See a comment below.

--- a/kernel/hung_task.c~hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
+++ a/kernel/hung_task.c
@@ -223,8 +223,11 @@ static inline void debug_show_blocker(st
}
#endif
-static void check_hung_task(struct task_struct *t, unsigned long timeout)
+static void check_hung_task(struct task_struct *t, unsigned long timeout,
+ unsigned long prev_detect_count)
{
+ unsigned long total_hung_task;
+
if (!task_is_hung(t, timeout))
return;
@@ -234,13 +237,19 @@ static void check_hung_task(struct task_
*/
sysctl_hung_task_detect_count++;
+ total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
trace_sched_process_hang(t);
+ if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+ console_verbose();
+ hung_task_call_panic = true;
+ }
+
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
- if (sysctl_hung_task_warnings) {
+ if (sysctl_hung_task_warnings || hung_task_call_panic) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",

This restores the behavior after the commit 9544f9e6947f6508
("hung_task: panic when there are more than N hung tasks at
the same time"). It is better than nothing.

Well, the behavior is still not ideal. It would be better when
we printed backtraces from _all_ "hung" tasks before panicking.
But it prints the backtraces only when sysctl_hung_task_panic
limit is reached.

I mean, for example, let's have:

+ sysctl_hung_task_warnings = 2;
+ sysctl_hung_task_panic = 5;
+ and detect 6 hung tasks.

The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
because sysctl_hung_task_warnings reached 0. It will report 5th and
6th tasks because (total_hung_task >= 5).

It is better than nothing. But it might be confusing.

Right, I can see how it might be confusing.

IMHO, sysctl_hung_task_warnings is a user-configured limit on verbosity.
It makes sense that reports are suppressed after the limit is exhausted,
except when the sysctl_hung_task_panic threshold is reached ;)

I am not sure how to fix it. A minimalist solution would be to print
a warning. Something like:

if (sysctl_hung_task_panic > 1 &&
(total_hung_task == sysctl_hung_task_panic) &&
!sysctl_hung_task_warnings) {
pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
sysctl_hung_task_panic - 1);

Or we could print the "total_hung_task" counter somewhere, for
example,

pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
total_hung_task, ...

Or we could restart the for_each_process_thread() cycle and make sure
that all hung tasks will get reported.

Or we could ignore it until anyone complains.

It looks like we already inform the user when that happens. When
sysctl_hung_task_warnings is finally decremented to zero, the code prints:

```
if (!sysctl_hung_task_warnings)
pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
```

Given that this explicit warning is already in place, perhaps the current
behavior is sufficient and clear enough?

Thanks,
Lance