Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
From: Harald Moeller
Date: Sat Dec 02 2017 - 07:16:15 EST
Hello, my name is Harry and this is my first post here, hope I'm doing
this the right way, sorry if not ...
I'm not a subscriber to the full list yet so I understand I shall ask
you to be personally CCed.
I am following this as I do experience the same (or sort-a same) issue
with 4.14.2.
My setup is more simple, just an oVirt host shutting down some VMs.
Doesn't happen all the time but I'd say around 3 from 10.
This is what I see (slightly different from David):
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173
blocked for more than 120 seconds.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Tainted: G
I 4.14.2-1.el7.hakimo.x86_64 #4
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm D 0
1173 1 0x00000084
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace:
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: schedule+0x36/0x80
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:
vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: ? remove_wait_queue+0x60/0x60
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ioctl+0x317/0x8e0
[vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: SyS_ioctl+0x79/0x90
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:
entry_SYSCALL64_slow_path+0x25/0x25
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58
EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX:
000055abaa2d29c0 RCX: 00007fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI:
000000004008af30 RDI: 0000000000000028
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08:
000055aba805e10f R09: 00000000ffffffff
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11:
0000000000000246 R12: 000055ababf32510
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14:
000055ababf32498 R15: 000055abaa2a0b40
This is still happening after reverting the three suggested commits
1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")
c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct
ubuf_info)->refcnt to refcount_t")
581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on
stand-alone ptype in dev_queue_xmit_nit"}
Anything I could be helpful with trying to solve this? Any more info I
could provide?
Harry