Re: hung task in 4.14 (syzbot bug from 2018 April 17)

From: Sebastian Kuzminsky
Date: Fri Oct 05 2018 - 17:56:29 EST


Another instance of the same problem (also on 4.14.67), this time with
the hung task timeout restored to its 120 second default:

[ 980.954129] INFO: task systemd:1 blocked for more than 120 seconds.
[ 980.954132] Tainted: G OE 4.14.67-solidfire1 #1
[ 980.954133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 980.954135] systemd D 0 1 0 0x00000000
[ 980.954137] Call Trace:
[ 980.954150] ? __schedule+0x27f/0x870
[ 980.954154] ? check_preempt_wakeup+0x102/0x230
[ 980.954156] schedule+0x28/0x80
[ 980.954159] schedule_timeout+0x1e7/0x340
[ 980.954163] ? radix_tree_node_ctor+0x20/0x20
[ 980.954165] ? delete_node+0x1b8/0x1f0
[ 980.954167] ? wait_for_completion+0xb0/0x120
[ 980.954169] wait_for_completion+0xb0/0x120
[ 980.954172] ? wake_up_q+0x70/0x70
[ 980.954176] flush_work+0x10d/0x1c0
[ 980.954179] ? worker_detach_from_pool+0xa0/0xa0
[ 980.954183] fsnotify_destroy_group+0x34/0xa0
[ 980.954185] inotify_release+0x1a/0x50
[ 980.954189] __fput+0xd8/0x220
[ 980.954194] task_work_run+0x8a/0xb0
[ 980.954198] exit_to_usermode_loop+0xb9/0xc0
[ 980.954200] do_syscall_64+0x10b/0x120
[ 980.954203] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 980.954206] RIP: 0033:0x7f031a552900
[ 980.954208] RSP: 002b:00007fff93fd18b0 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[ 980.954210] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00007f031a552900
[ 980.954211] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000010
[ 980.954212] RBP: 00007f031beba088 R08: 0000000000000020 R09: 000055ac5a21e2ad
[ 980.954213] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[ 980.954214] R13: 000055ac5a488b20 R14: 000055ac5c0907f0 R15: 0000000000000000
[ 980.954724] INFO: task kworker/u114:4:41909 blocked for more than
120 seconds.
[ 980.954725] Tainted: G OE 4.14.67-solidfire1 #1
[ 980.954726] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 980.954727] kworker/u114:4 D 0 41909 2 0x80000000
[ 980.954733] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[ 980.954734] Call Trace:
[ 980.954739] ? __schedule+0x27f/0x870
[ 980.954741] schedule+0x28/0x80
[ 980.954743] schedule_timeout+0x1e7/0x340
[ 980.954746] ? __switch_to_asm+0x40/0x70
[ 980.954747] ? update_curr+0xe1/0x1a0
[ 980.954750] ? wait_for_completion+0xb0/0x120
[ 980.954751] wait_for_completion+0xb0/0x120
[ 980.954753] ? wake_up_q+0x70/0x70
[ 980.954758] __synchronize_srcu.part.13+0x76/0x90
[ 980.954761] ? trace_raw_output_rcu_utilization+0x40/0x40
[ 980.954762] ? try_to_wake_up+0x44/0x460
[ 980.954765] ? fsnotify_mark_destroy_workfn+0x67/0xb0
[ 980.954767] fsnotify_mark_destroy_workfn+0x67/0xb0
[ 980.954769] process_one_work+0x1da/0x3d0
[ 980.954772] worker_thread+0x21f/0x3f0
[ 980.954775] ? process_one_work+0x3d0/0x3d0
[ 980.954777] kthread+0x119/0x130
[ 980.954779] ? kthread_create_on_node+0x40/0x40
[ 980.954781] ret_from_fork+0x35/0x40

crash> print reaper_work
$3 = {
work = {
data = {
counter = -117148584927483
},
entry = {
next = 0xffff95743772b770,
prev = 0xffffa2aa800a3df8
},
func = 0xffffffffa8249df0
},
timer = {
entry = {
next = 0xdead000000000200,
pprev = 0x0
},
expires = 4295020559,
function = 0xffffffffa80778c0,
data = 18446744072250514272,
flags = 69206020
},
wq = 0xffff95443f011400,
cpu = 128
}

Any input is welcome.

--
Sebastian Kuzminsky