Re: Debugging hung tasks?

From: Ben Greear
Date: Mon Apr 25 2011 - 13:38:39 EST


On 04/22/2011 08:55 PM, Randy Dunlap wrote:
On Fri, 22 Apr 2011 16:09:29 -0700 Ben Greear wrote:

I am testing lots of NFS traffic against an over-loaded and slow file server.

I enabled the hung-task detection logic, and it's hitting after 180
seconds.

First: Is there any valid reason to have funky NFS cause a hung task?

Second: Why doesn't the hung-task panic logic print the stack trace of
the hung task?
Is this an option that can be enabled?

hung_task.c::check_hung_task() always calls sched_show_task() and
optionally does the panic:

if (sysctl_hung_task_panic)
panic("hung_task: blocked tasks");

sched.c::sched_show_task() calls show_stack(), which should be doing what
you are asking for AFAICT. What kernel version are you using?

Here's one of the panics, for instance (captured on serial console).

There is a lockdep splat in 2.6.36.4 early on, (known bug, but
not fixed since that kernel is EOL), so that is probably why there
is no locking info printed. But, I was expecting a more useful stack
trace since it appears to be our user-space application (btserver)
that is hung.

Apr 22 15:57:38 localhost kernel: nfs: server 192.168.100.19 not responding, still trying
Apr 22 15:57:38 localhost kernel: nfs: server 192.168.100.19 OK
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 58, comm: khungtaskd Not tainted 2.6.36.4+ #1
Apr 22 15:59:08 Call Trace:
localhost kernel [<ffffffff8140174a>] panic+0x96/0x1ae
: INFO: task bts [<ffffffff81093106>] watchdog+0x1b1/0x1f9
erver:15212 bloc [<ffffffff81092f55>] ? watchdog+0x0/0x1f9
ked for more tha [<ffffffff8105c774>] kthread+0x7d/0x85
n 180 seconds.
[<ffffffff8100a8e4>] kernel_thread_helper+0x4/0x10
Apr 22 15:59:08 [<ffffffff81404a54>] ? restore_args+0x0/0x30
localhost kernel [<ffffffff8105c6f7>] ? kthread+0x0/0x85
: "echo 0 > /pro [<ffffffff8100a8e0>] ? kernel_thread_helper+0x0/0x10
c/sys/kernel/hunpanic occurred, switching back to text console
Rebooting in 10 seconds..^C

We're testing 2.6.38.4 now..haven't seen this problem again,
so maybe it's fixed anyway...

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/