Re: [crash] PANIC: double fault, error_code: 0x0
From: Ingo Molnar
Date: Fri Nov 24 2017 - 17:09:44 EST
* Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> > This is a repost of the latest entry-stack plus Kaiser bits from Andy Lutomirski
> > (v3 series from today) and Dave Hansen (kaiser-414-tipwip-20171123 version),
> > on top of latest tip:x86/urgent (12a78d43de76).
> >
> > This version is pretty well tested, at least on the usual x86 tree test systems.
> > It has a couple of merge mistakes fixed, the biggest difference is in patch #22:
> >
> > x86/mm/kaiser: Prepare assembly for entry/exit CR3 switching
> >
> > The other patches are identical or very close to what I posted earlier today.
>
> Here's a new bug, on a testsystem I get the double fault boot crash attached
> below. The same bzImage crashes on other systems as well, so it's not CPU
> dependent.
>
> Via Kconfig-bisection I have narrowed it down to the following .config detail:
> it's triggered by _disabling_ CONFIG_DEBUG_ENTRY and enabling CONFIG_KAISER=y.
>
> I.e. one of the sanity checks of CONFIG_DEBUG_ENTRY has some positive side effect.
> I'll try to track down which one it is - any ideas meanwhile?
>
> Thanks,
>
> Ingo
>
> [ 8.797733] calling pt_dump_init+0x0/0x3b @ 1
> [ 8.803144] initcall pt_dump_init+0x0/0x3b returned 0 after 1 usecs
> [ 8.810589] calling aes_init+0x0/0x11 @ 1
> [ 8.815757] initcall aes_init+0x0/0x11 returned 0 after 141 usecs
> [ 8.823020] calling ghash_pclmulqdqni_mod_init+0x0/0x54 @ 1
> [ 8.831002] PANIC: double fault, error_code: 0x0
> [ 8.831002] CPU: 11 PID: 260 Comm: modprobe Not tainted 4.14.0-01419-g1b46550a680d-dirty #17
> [ 8.831002] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [ 8.831002] task: ffff880828ba8000 task.stack: ffffc90004444000
> [ 8.831002] RIP: 0010:page_fault+0x11/0x60
> [ 8.831002] RSP: 0000:ffffffffff0e7fc8 EFLAGS: 00010046
> [ 8.831002] RAX: 00000000819d4d77 RBX: 0000000000000001 RCX: ffffffff819d4d77
After much more debugging, the patch below 'fixes' the crash as well, when
CONFIG_DEBUG_ENTRY is disabled.
Note that if *any* of those 4 padding sequences is removed, the kernel starts
crashing again. Also note that the exact size of the padding appears to be not
material - it could be larger as well, i.e. it's not an alignment bug I think.
In any case it's not a problem in the actual assembly code paths itself it
appears.
One guess would be tha it's some sort of sizing bug: maybe the padding forces a
key piece of data or code on another page - but I'm too tired to root cause it
right now.
Any ideas?
Thanks,
Ingo
---
arch/x86/entry/entry_64.S | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4ac952080869..e83029892017 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -547,6 +547,8 @@ END(irq_entries_start)
ud2
.Lokay_\@:
addq $8, %rsp
+#else
+ .rep 16; nop; .endr
#endif
.endm
@@ -597,6 +599,8 @@ END(irq_entries_start)
je .Lirq_stack_okay\@
ud2
.Lirq_stack_okay\@:
+#else
+ .rep 16; nop; .endr
#endif
.Lirq_stack_push_old_rsp_\@:
@@ -707,6 +711,8 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
jnz 1f
ud2
1:
+#else
+ .rep 16; nop; .endr
#endif
POP_EXTRA_REGS
popq %r11
@@ -773,6 +779,8 @@ GLOBAL(restore_regs_and_return_to_kernel)
jz 1f
ud2
1:
+#else
+ .rep 16; nop; .endr
#endif
POP_EXTRA_REGS
POP_C_REGS