[PATCH 0/2] hung_task: Dump the blocking task stacktrace

From: Masami Hiramatsu (Google)
Date: Wed Feb 19 2025 - 08:07:04 EST


Hi,

The hung_task detector is very useful for detecting the lockup.
However, since it only dumps the blocked (uninterruptible sleep)
processes, it is not enough to identify the root cause of that
lockup.

For example, if a process holds a mutex and sleep an event in
interruptible state long time, the other processes will wait on
the mutex in uninterruptible state. In this case, the waiter
processes are dumped, but the blocker process is not shown
because it is sleep in interruptible state.

This adds a feature to dump the blocker task which holds a mutex
when detecting a hung task. e.g.

INFO: task cat:113 blocked for more than 122 seconds.
Not tainted 6.14.0-rc3-00002-g6afe972e1b9b #152
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:cat state:D stack:13432 pid:113 tgid:113 ppid:103 task_flags:0x400100 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x731/0x960
? schedule_preempt_disabled+0x54/0xa0
schedule+0xb7/0x140
? __mutex_lock+0x51d/0xa50
? __mutex_lock+0x51d/0xa50
schedule_preempt_disabled+0x54/0xa0
__mutex_lock+0x51d/0xa50
? current_time+0x3a/0x120
read_dummy+0x23/0x70
full_proxy_read+0x6a/0xc0
vfs_read+0xc2/0x340
? __pfx_direct_file_splice_eof+0x10/0x10
? do_sendfile+0x1bd/0x2e0
ksys_read+0x76/0xe0
do_syscall_64+0xe3/0x1c0
? exc_page_fault+0xa9/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x4840cd
RSP: 002b:00007ffe632b76c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
RDX: 0000000000001000 RSI: 00007ffe632b7710 RDI: 0000000000000003
RBP: 00007ffe632b7710 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
R13: 000000003a8b63a0 R14: 0000000000000001 R15: ffffffffffffffff
</TASK>
INFO: task cat:113 is blocked on a mutex owned by task cat:112.
task:cat state:S stack:13432 pid:112 tgid:112 ppid:103 task_flags:0x400100 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x731/0x960
? schedule_timeout+0xa8/0x120
schedule+0xb7/0x140
schedule_timeout+0xa8/0x120
? __pfx_process_timeout+0x10/0x10
msleep_interruptible+0x3e/0x60
read_dummy+0x2d/0x70
full_proxy_read+0x6a/0xc0
vfs_read+0xc2/0x340
? __pfx_direct_file_splice_eof+0x10/0x10
? do_sendfile+0x1bd/0x2e0
ksys_read+0x76/0xe0
do_syscall_64+0xe3/0x1c0
? exc_page_fault+0xa9/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x4840cd
RSP: 002b:00007ffd69513748 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
RDX: 0000000000001000 RSI: 00007ffd69513790 RDI: 0000000000000003
RBP: 00007ffd69513790 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
R13: 0000000029d8d3a0 R14: 0000000000000001 R15: ffffffffffffffff
</TASK>

Thank you,

---

Masami Hiramatsu (Google) (2):
hung_task: Show the blocker task if the task is hung on mutex
samples: Add hung_task detector mutex blocking sample


kernel/hung_task.c | 38 ++++++++++++++++++++
kernel/locking/mutex-debug.c | 1 +
kernel/locking/mutex.c | 9 +++++
kernel/locking/mutex.h | 6 +++
samples/Kconfig | 9 +++++
samples/Makefile | 1 +
samples/hung_task/Makefile | 2 +
samples/hung_task/hung_task_mutex.c | 66 +++++++++++++++++++++++++++++++++++
8 files changed, 132 insertions(+)
create mode 100644 samples/hung_task/Makefile
create mode 100644 samples/hung_task/hung_task_mutex.c

--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>