Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

From: Bart Van Assche
Date: Tue Aug 09 2016 - 19:56:29 EST


On 08/08/2016 09:20 AM, Oleg Nesterov wrote:
> So far _I think_ that the bug is somewhere else... Say, someone clears
> PG_locked without wake_up(). Then SIGKILL sent to the task sleeping in
> sys_read() "adds" the necessary wakeup...

Hello Oleg,

Something that puzzles me is that removing the "else" keyword from
abort_exclusive_wait() is sufficient to avoid the hang. If there would
be code that clears PG_locked without calling wake_up() this hang
probably would also be triggered by workloads that do not wake up
lock_page_killable() with a signal. BTW, the
WARN_ONCE(!list_empty(&wait->task_list) && waitqueue_active(q), "mode =
%#x\n", mode) statement that I added in abort_exclusive_wait() just
produced the following call stack:

Aug 9 16:16:38 ion-dev-ib-ini kernel: WARNING: CPU: 0 PID: 14767 at kernel/sched/wait.c:284 abort_exclusive_wait+0xe3/0xf0
Aug 9 16:16:38 ion-dev-ib-ini kernel: mode = 0x82
Aug 9 16:16:38 ion-dev-ib-ini kernel: [last unloaded: brd]
Aug 9 16:16:38 ion-dev-ib-ini kernel: CPU: 0 PID: 14767 Comm: kpartx Tainted: G W 4.7.0-dbg+ #3
Aug 9 16:16:38 ion-dev-ib-ini kernel: Call Trace:
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff81320157>] dump_stack+0x68/0xa1
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff81061c46>] __warn+0xc6/0xe0
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff81061caa>] warn_slowpath_fmt+0x4a/0x50
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff810ab7d3>] abort_exclusive_wait+0xe3/0xf0
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff8161fb61>] __wait_on_bit_lock+0x61/0xa0
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff8114ff69>] __lock_page_killable+0xb9/0xc0
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff8115305a>] generic_file_read_iter+0x1ea/0x770
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff812134f0>] blkdev_read_iter+0x30/0x40
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff811d26bb>] __vfs_read+0xbb/0x130
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff811d2aa1>] vfs_read+0x91/0x130
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff811d3e24>] SyS_read+0x44/0xa0
Aug 9 16:16:38 ion-dev-ib-ini kernel: [<ffffffff81624fe5>] entry_SYSCALL_64_fastpath+0x18/0xa8

(gdb) list *(generic_file_read_iter+0x1ea)
0xffffffff8115305a is in do_generic_file_read (mm/filemap.c:1730).
1725 continue;
1726
1727 page_not_up_to_date:
1728 /* Get exclusive access to the page ... */
1729 error = lock_page_killable(page);
1730 if (unlikely(error))
1731 goto readpage_error;
1732
1733 page_not_up_to_date_locked:
1734 /* Did it get truncated before we got the lock? */

Apparently the task that hangs is the same task as the one that
received the signal (PID 14767; state "D" = TASK_UNINTERRUPTIBLE):

[ 3718.134118] sysrq: SysRq : Show Blocked State
[ 3718.136234] kpartx D ffff8803c7767838 0 14767 1 0x00000006
[ 3718.136928] Call Trace:
[ 3718.137089] [<ffffffff8161f5b7>] schedule+0x37/0x90
[ 3718.137142] [<ffffffff81623c0f>] schedule_timeout+0x27f/0x470
[ 3718.137603] [<ffffffff8161e98f>] io_schedule_timeout+0x9f/0x110
[ 3718.137662] [<ffffffff8161fd56>] bit_wait_io+0x16/0x60
[ 3718.137714] [<ffffffff8161fb49>] __wait_on_bit_lock+0x49/0xa0
[ 3718.137764] [<ffffffff8114fea9>] __lock_page+0xb9/0xc0
[ 3718.137865] [<ffffffff81165dd0>] truncate_inode_pages_range+0x3e0/0x760
[ 3718.138175] [<ffffffff81166160>] truncate_inode_pages+0x10/0x20
[ 3718.138477] [<ffffffff81212a60>] kill_bdev+0x30/0x40
[ 3718.138529] [<ffffffff81213d81>] __blkdev_put+0x71/0x360
[ 3718.138631] [<ffffffff812140b9>] blkdev_put+0x49/0x170
[ 3718.138681] [<ffffffff81214200>] blkdev_close+0x20/0x30
[ 3718.138732] [<ffffffff811d4928>] __fput+0xe8/0x1f0
[ 3718.138782] [<ffffffff811d4a69>] ____fput+0x9/0x10
[ 3718.138834] [<ffffffff810842d3>] task_work_run+0x83/0xb0
[ 3718.138886] [<ffffffff8106606e>] do_exit+0x3ee/0xc40
[ 3718.138987] [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
[ 3718.139038] [<ffffffff81073d9a>] get_signal+0x2ca/0x940
[ 3718.139142] [<ffffffff8101bf43>] do_signal+0x23/0x660
[ 3718.139247] [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
[ 3718.139297] [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
[ 3718.139349] [<ffffffff81625073>] entry_SYSCALL_64_fastpath+0xa6/0xa8

I'll try to see whether this behavior is reproducible.

Bart.