Re: Hang in 9p/virtio

From: Vegard Nossum
Date: Tue Aug 02 2016 - 10:50:35 EST


On 08/02/2016 11:13 AM, Vegard Nossum wrote:
On 08/02/2016 11:03 AM, Cornelia Huck wrote:
On Sat, 30 Jul 2016 23:42:18 +0200
Vegard Nossum <vegard.nossum@xxxxxxxxxx> wrote:

Hi,

With fault injection triggering an allocation failure for the
alloc_indirect() call in virtqueue_add() I'm seeing a hang in
p9_virtio_zc_request() -- it seems to be waiting here indefinitely
(i.e. at least 120 seconds):

[...]

What happens is that the code falls back to direct virtio addressing
(after indirect addressing failed) - and this should work.

I'm more inclined to suspect a qemu instead of a kernel bug, as your
qemu version is quite old and there have been fixes in the virtio
buffer handling and virtio-9p in the meantime. (I'm suspecting
"virtio-9p: fix any_layout".)

Could you retry with a more recent qemu (at least version 2.4)?

I think maybe the version number in the stack trace is a bit misleading,
this is the full/actual version:

$ kvm --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
(c) 2003-2008 Fabrice Bellard

I'll still try to get qemu from git and see if it makes a difference.
Thanks,

I still seem to get it:

$ qemu-system-x86_64 --version
QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright (c) 2003-2008 Fabrice Bellard

INFO: task trinity-c2:26510 blocked for more than 120 seconds.
Not tainted 4.7.0+ #71
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
trinity-c2 D ffff88010509fcf0 26600 26510 1238 0x10080004
ffff88010509fcf0 ffff880119ea1080 ffff880119ea1098 ffff88011ada06a0
ffff88011ada06c8 ffff88011ad9fd58 ffffffff844d3060 ffff880119d0db00
ffff880119ea0000 ffff880105098000 ffffed0020a13001 ffff880105098008
Call Trace:
[<ffffffff838a01fa>] schedule+0x9a/0x1c0
[<ffffffff838a0373>] schedule_preempt_disabled+0x13/0x20
[<ffffffff838a4046>] mutex_lock_nested+0x2d6/0x7d0
[<ffffffff81512244>] ? __fdget_pos+0x84/0xb0
[<ffffffff838a3d70>] ? ww_mutex_unlock+0x260/0x260
[<ffffffff814bb510>] ? do_pwritev+0x170/0x170
[<ffffffff81512244>] __fdget_pos+0x84/0xb0
[<ffffffff814bad19>] do_readv+0x79/0x270
[<ffffffff814baca0>] ? vfs_readv+0xa0/0xa0
[<ffffffff81dd1fd3>] ? __this_cpu_preempt_check+0x13/0x20
[<ffffffff814bb510>] ? do_pwritev+0x170/0x170
[<ffffffff814bb51b>] SyS_readv+0xb/0x10
[<ffffffff81005391>] do_syscall_64+0x1a1/0x460
[<ffffffff8137335a>] ? __context_tracking_enter+0xaa/0x200
[<ffffffff838adc6a>] entry_SYSCALL64_slow_path+0x25/0x25
1 lock held by trinity-c2/26510:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>] __fdget_pos+0x84/0xb0

Showing all locks held in the system:
2 locks held by khungtaskd/505:
#0: (rcu_read_lock){......}, at: [<ffffffff812baf3f>] watchdog+0xff/0x840
#1: (tasklist_lock){......}, at: [<ffffffff811e1f80>] debug_show_all_locks+0x70/0x280
1 lock held by trinity-c1/26123:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>] __fdget_pos+0x84/0xb0
1 lock held by trinity-c2/26510:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>] __fdget_pos+0x84/0xb0
1 lock held by trinity-c0/29159:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>] __fdget_pos+0x84/0xb0

=============================================

...

Kernel panic - not syncing: hung_task: blocked tasks
CPU: 0 PID: 505 Comm: khungtaskd Not tainted 4.7.0+ #71
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014


Vegard