VM crashes during early stages of boot

From: Noa Osherovich
Date: Wed Mar 29 2017 - 11:24:35 EST


Hi,

Starting with kernel 4.11-rc1 our regression VMs crash during boot.
Not all of them and not every time, but this happens often enough and
in a very early boot stage (example outputs below).

Did anyone else see these with 4.11 RCs?
Any suggestions are welcome.

Example host details:
X86 running Red Hat 7.0
$ virsh version
Compiled against library: libvirt 1.1.1
Using library: libvirt 1.1.1
Using API: QEMU 1.1.1
Running hypervisor: QEMU 2.0.0

Thanks,
Noa

[ 1.567946] general protection fault: 0000 [#1] SMP
[ 1.827994] FDC 0 is a S82078B
[ 1.829512] Modules linked in: e1000(+) virtio_console floppy ata_piix serio_raw i2c_core
[ 1.832179] CPU: 1 PID: 307 Comm: systemd-udevd Not tainted 4.11.0-rc4-for-linust-perf-2017-03-28_07-46-10-70 #1
[ 1.835515] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[ 1.837448] task: ffff880328f4a880 task.stack: ffffc9000239c000
[ 1.839656] RIP: 0010:__skb_try_recv_datagram+0x25a/0x2c0
[ 1.841471] RSP: 0018:ffffc9000239fbd0 EFLAGS: 00010046
[ 1.843438] RAX: 0000000000000000 RBX: ffff880328eca8dc RCX: ff0074757074756f
[ 1.846334] RDX: 0000000000000001 RSI: 2d306f6974726976 RDI: ffff880328eca800
[ 1.848643] RBP: ffffc9000239fc30 R08: ffff880328eca8c8 R09: ffffc9000239fc58
[ 1.851064] R10: ffff880329d05f80 R11: 0000000000000246 R12: ffffc9000239fca4
[ 1.854048] R13: ffff880328eca800 R14: ffff880328eac000 R15: 0000000000000000
[ 1.856629] FS: 00007fb84083e880(0000) GS:ffff88033fc40000(0000) knlGS:0000000000000000
[ 1.859391] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.861322] CR2: 00007fb840847000 CR3: 0000000328f93000 CR4: 00000000000006e0
[ 1.863688] Call Trace:
[ 1.864606] __skb_recv_datagram+0x83/0xb0
[ 1.865963] skb_recv_datagram+0x34/0x40
[ 1.867955] netlink_recvmsg+0x49/0x3f0
[ 1.869510] ? selinux_socket_recvmsg+0x17/0x20
[ 1.871384] ? security_socket_recvmsg+0x4b/0x70
[ 1.874948] sock_recvmsg+0x3d/0x50
[ 1.877770] ___sys_recvmsg+0xc4/0x1c0
[ 1.880386] ? ep_send_events_proc+0x13f/0x1c0
[ 1.883355] ? ep_scan_ready_list.isra.11+0x1c2/0x1e0
[ 1.886043] ? ep_poll+0x15e/0x380
[ 1.888579] __sys_recvmsg+0x42/0x80
[ 1.891066] SyS_recvmsg+0x12/0x20
[ 1.893299] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 1.895973] RIP: 0033:0x7fb83f77d660
[ 1.902155] RSP: 002b:00007ffd12c03138 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
[ 1.906627] RAX: ffffffffffffffda RBX: 00005639c35d8660 RCX: 00007fb83f77d660
[ 1.910797] RDX: 0000000000000000 RSI: 00007ffd12c031c0 RDI: 000000000000000b
[ 1.914051] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 1.918235] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000
[ 1.921612] R13: 00007ffd12c01598 R14: 0000000000000008 R15: 0000000000000107
[ 1.924954] Code: 8b 6d d0 49 89 c3 48 8b 45 b0 41 83 ad d8 00 00 00 01 49 8b 36 49 8b 4e 08 48 85 c0 49 c7 46 08 00 00 00 00 49 c7 06 00 00 00 00 <48> 89 4e 08 48 89 31 0f 84 06 ff ff ff 4c 89 5d d0 4c 89 f6 4c
[ 1.934095] RIP: __skb_try_recv_datagram+0x25a/0x2c0 RSP: ffffc9000239fbd0
[ 1.937414] ---[ end trace aeac107066647a4f ]---
[ 1.942667] Kernel panic - not syncing: Fatal exception
[ 1.946492] Kernel Offset: disabled
[ 1.948749] ---[ end Kernel panic - not syncing: Fatal exception


[ 1.416471] BUG: unable to handle kernel paging request at ffff8803234002b0
[ 1.553597] IP: __handle_mm_fault+0x1c9/0x1160
[ 1.554248] PGD 2441067
[ 1.554249] PUD 2444067
[ 1.554810] PMD 324744063
[ 1.555138] scsi host0: ata_piix
[ 1.555301] scsi host1: ata_piix
[ 1.555341] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc120 irq 14
[ 1.555342] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc128 irq 15
[ 1.557886] PTE 2d306f6974726976
[ 1.557887]
[ 1.558513] Oops: 0000 [#1] SMP
[ 1.558912] Modules linked in: ata_piix i2c_core serio_raw floppy(+) virtio_console
[ 1.559811] CPU: 2 PID: 288 Comm: systemd-udevd Not tainted 4.11.0-rc3-for-upstream-perf-2017-03-27_16-51-12-1 #1
[ 1.560960] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[ 1.561620] task: ffff880324718000 task.stack: ffffc9000240c000
[ 1.562354] RIP: 0010:__handle_mm_fault+0x1c9/0x1160
[ 1.562956] RSP: 0000:ffffc9000240fdc0 EFLAGS: 00010282
[ 1.563563] RAX: 0000000323400067 RBX: ffff880323ff5508 RCX: ffff880000000000
[ 1.564498] RDX: ffff8803234002b0 RSI: 0000000323400067 RDI: 0000000000000000
[ 1.565424] RBP: ffffc9000240fe68 R08: 00007f6dd028a270 R09: ffff880323ff5508
[ 1.566253] R10: 00007f6dd1142b50 R11: 00007f6dd1142880 R12: ffff8800000002b0
[ 1.567096] R13: ffffc9000240fdd8 R14: ffff880323fffed8 R15: ffff880324bc2e80
[ 1.568373] FS: 00007f6dd1142880(0000) GS:ffff88033c080000(0000) knlGS:0000000000000000
[ 1.570198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.571295] CR2: ffff8803234002b0 CR3: 0000000323ffe000 CR4: 00000000000006e0
[ 1.572595] Call Trace:
[ 1.573394] ? __switch_to+0x22f/0x4f0
[ 1.574301] handle_mm_fault+0xce/0x240
[ 1.575206] __do_page_fault+0x22a/0x4a0
[ 1.576134] trace_do_page_fault+0x37/0xe0
[ 1.577064] do_async_page_fault+0x19/0x70
[ 1.577989] async_page_fault+0x28/0x30
[ 1.578884] RIP: 0033:0x55f6cadaefc0
[ 1.579752] RSP: 002b:00007ffe2cbc5fe0 EFLAGS: 00010246
[ 1.580864] RAX: 0000000000000000 RBX: 000055f6cc8c51e0 RCX: 00007f6dd006c760
[ 1.582132] RDX: 00007f6dd006eae0 RSI: 00007f6dd006c760 RDI: 00007f6dd006d640
[ 1.583374] RBP: 000055f6cc8c53f0 R08: 00007f6dd028a270 R09: 0000000000000106
[ 1.584637] R10: 00007f6dd1142b50 R11: 00007f6dd1142880 R12: 000055f6cc8c5170
[ 1.586335] R13: 000055f6cc8c4394 R14: 000055f6cc893010 R15: 00000000000004fd
[ 1.587596] Code: 00 00 c0 ff 3f 00 00 48 0f 44 d1 48 b9 00 00 00 00 00 88 ff ff 49 01 cc 48 21 c2 4c 01 e2 48 85 d2 48 89 55 90 0f 84 9b 02 00 00 <48> 8b 3a 48 f7 c7 9f ff ff ff 75 1e 48 8b 05 a4 0a de 00 a8 01
[ 1.591001] RIP: __handle_mm_fault+0x1c9/0x1160 RSP: ffffc9000240fdc0
[ 1.592189] CR2: ffff8803234002b0
[ 1.593051] ---[ end trace 8ffea09f517a36de ]---
[ 1.594029] Kernel panic - not syncing: Fatal exception
[ 1.595105] Kernel Offset: disabled
[ 1.595929] ---[ end Kernel panic - not syncing: Fatal exception
[ 1.597046] ------------[ cut here ]------------
[ 1.597993] WARNING: CPU: 2 PID: 288 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3f/0x50
[ 1.599816] Modules linked in: ata_piix i2c_core serio_raw floppy(+) virtio_console
[ 1.601496] CPU: 2 PID: 288 Comm: systemd-udevd Tainted: G D 4.11.0-rc3-for-upstream-perf-2017-03-27_16-51-12-1 #1
[ 1.603520] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[ 1.604601] Call Trace:
[ 1.605318] <IRQ>
[ 1.606022] dump_stack+0x63/0x8c
[ 1.606831] __warn+0xd1/0xf0
[ 1.607591] warn_slowpath_null+0x1d/0x20
[ 1.608495] native_smp_send_reschedule+0x3f/0x50
[ 1.609435] try_to_wake_up+0x389/0x3f0
[ 1.610439] default_wake_function+0x12/0x20
[ 1.611689] __wake_up_common+0x55/0x90
[ 1.612646] __wake_up_locked+0x13/0x20
[ 1.613528] ep_poll_callback+0xc4/0x270
[ 1.614618] __wake_up_common+0x55/0x90
[ 1.615911] __wake_up+0x39/0x50
[ 1.616989] wake_up_klogd_work_func+0x40/0x60
[ 1.618178] irq_work_run_list+0x4d/0x70
[ 1.619046] ? try_to_free_pmd_page+0x40/0x40
[ 1.619978] irq_work_run+0x2c/0x40
[ 1.620802] flush_smp_call_function_queue+0x8f/0x160
[ 1.621775] generic_smp_call_function_single_interrupt+0x13/0x30
[ 1.622874] smp_call_function_interrupt+0x27/0x40
[ 1.623855] call_function_interrupt+0x89/0x90
[ 1.624791] RIP: 0010:panic+0x1e8/0x229
[ 1.625645] RSP: 0000:ffffc9000240fb10 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff03
[ 1.627291] RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000006
[ 1.628473] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88033c08dfe0
[ 1.629654] RBP: ffffc9000240fb78 R08: 0000000000000000 R09: 0000000000000213
[ 1.630856] R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffff81c86dee
[ 1.632056] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000046
[ 1.633236] </IRQ>
[ 1.633917] oops_end+0xb1/0xc0
[ 1.634690] no_context+0x17c/0x3d0
[ 1.635537] __bad_area_nosemaphore+0xee/0x1d0
[ 1.636805] bad_area_nosemaphore+0x14/0x20
[ 1.637690] __do_page_fault+0x89/0x4a0
[ 1.638546] trace_do_page_fault+0x37/0xe0
[ 1.639471] do_async_page_fault+0x19/0x70
[ 1.640413] async_page_fault+0x28/0x30
[ 1.641289] RIP: 0010:__handle_mm_fault+0x1c9/0x1160
[ 1.642277] RSP: 0000:ffffc9000240fdc0 EFLAGS: 00010282
[ 1.643269] RAX: 0000000323400067 RBX: ffff880323ff5508 RCX: ffff880000000000
[ 1.644448] RDX: ffff8803234002b0 RSI: 0000000323400067 RDI: 0000000000000000
[ 1.645632] RBP: ffffc9000240fe68 R08: 00007f6dd028a270 R09: ffff880323ff5508
[ 1.647177] R10: 00007f6dd1142b50 R11: 00007f6dd1142880 R12: ffff8800000002b0
[ 1.648380] R13: ffffc9000240fdd8 R14: ffff880323fffed8 R15: ffff880324bc2e80
[ 1.649605] ? __switch_to+0x22f/0x4f0
[ 1.650470] handle_mm_fault+0xce/0x240
[ 1.651530] __do_page_fault+0x22a/0x4a0
[ 1.652838] trace_do_page_fault+0x37/0xe0
[ 1.654038] do_async_page_fault+0x19/0x70
[ 1.655122] async_page_fault+0x28/0x30
[ 1.656003] RIP: 0033:0x55f6cadaefc0
[ 1.656832] RSP: 002b:00007ffe2cbc5fe0 EFLAGS: 00010246
[ 1.657812] RAX: 0000000000000000 RBX: 000055f6cc8c51e0 RCX: 00007f6dd006c760
[ 1.658978] RDX: 00007f6dd006eae0 RSI: 00007f6dd006c760 RDI: 00007f6dd006d640
[ 1.660148] RBP: 000055f6cc8c53f0 R08: 00007f6dd028a270 R09: 0000000000000106
[ 1.661345] R10: 00007f6dd1142b50 R11: 00007f6dd1142880 R12: 000055f6cc8c5170
[ 1.662520] R13: 000055f6cc8c4394 R14: 000055f6cc893010 R15: 00000000000004fd
[ 1.663728] ---[ end trace 8ffea09f517a36df ]---