Re: 4.14.9 doesn't boot (regression)

From: Alexander Tsoy
Date: Sat Dec 30 2017 - 03:45:53 EST


Ð ÐÑ, 29/12/2017 Ð 21:49 -0600, Josh Poimboeuf ÐÐÑÐÑ:
> On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski wrote:
> > (Also, Josh, the oops code should have printed the contents of the
> > struct pt_regs at the top of the DF stack.ÂÂAny idea why it
> > didn't?)
>
> Looking at one of the dumps:
>
> Â [ÂÂ392.774879] NMI backtrace for cpu 0
> Â [ÂÂ392.774881] CPU: 0 PID: 1 Comm: init Not tainted 4.14.9-gentoo
> #1
> Â [ÂÂ392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Â [ÂÂ392.774882] task: ffff8802368b8000 task.stack: ffffc9000000c000
> Â [ÂÂ392.774885] RIP: 0010:double_fault+0x0/0x30
> Â [ÂÂ392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
> Â [ÂÂ392.774887] RAX: 000000003fc00000 RBX: 0000000000000001 RCX:
> 00000000c0000101
> Â [ÂÂ392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000 RDI:
> ffffffffff527f58
> Â [ÂÂ392.774887] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> Â [ÂÂ392.774888] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffff816ae726
> Â [ÂÂ392.774888] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> Â [ÂÂ392.774889] FS:ÂÂ0000000000000000(0000)
> GS:ffff88023fc00000(0000) knlGS:0000000000000000
> Â [ÂÂ392.774889] CS:ÂÂ0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Â [ÂÂ392.774890] CR2: ffffffffff526f08 CR3: 0000000235b48002 CR4:
> 00000000001606f0
> Â [ÂÂ392.774892] Call Trace:
> Â [ÂÂ392.774894]ÂÂ<#DF>
> Â [ÂÂ392.774897]ÂÂdo_double_fault+0xb/0x140
> Â [ÂÂ392.774898]ÂÂ</#DF>
>
> It should have at least printed the #DF iret frame registers, which I
> recently added support for in "x86/unwinder: Handle stack overflows
> more
> gracefully", which is in both 4.14.9 and 4.15-rc5.
>
> I think the missing iret regs are due to a bug in
> show_trace_log_lvl(),
> where if the unwind starts with two regs frames in a row, the second
> regs don't get printed.
>
> Alexander, would you mind reproducing again with the below patch?ÂÂIt
> should still fail, but this time it should hopefully show another
> RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line.
>

Yes, it works:

[ÂÂÂ23.058064] NMI backtrace for cpu 2
[ÂÂÂ23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1
[ÂÂÂ23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1.fc27 04/01/2014
[ÂÂÂ23.058074] RIP: 0010:double_fault+0x0/0x30
[ÂÂÂ23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086
[ÂÂÂ23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
00000000c0000101
[ÂÂÂ23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
fffffe800005ff58
[ÂÂÂ23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[ÂÂÂ23.058079] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffffff92001426
[ÂÂÂ23.058080] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ÂÂÂ23.058083] FS:ÂÂ0000000000000000(0000) GS:ffff96813fd00000(0000)
knlGS:0000000000000000
[ÂÂÂ23.058084] CS:ÂÂ0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ÂÂÂ23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4:
00000000000406a0
[ÂÂÂ23.058089] Call Trace:
[ÂÂÂ23.058101]ÂÂ<#DF>
[ÂÂÂ23.058104] RIP: 0010:do_double_fault+0xb/0x140
[ÂÂÂ23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086 ORIG_RAX:
0000000000000000
[ÂÂÂ23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
00000000c0000101
[ÂÂÂ23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
fffffe800005ff58
[ÂÂÂ23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[ÂÂÂ23.058108] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffffff92001426
[ÂÂÂ23.058108] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ÂÂÂ23.058111]ÂÂ</#DF>
[ÂÂÂ23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69 06 00
00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00 00 0f 1f 44
00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7 48 8b 74 24 78 48