Non-booting current Linus' tree

From: Jan Kara
Date: Fri Jul 03 2015 - 11:23:18 EST


Hello,

so I was wondering why I cannot boot current Linus' tree in my kvm
instance. The boot dies after writing "Booting the kernel" message. After
some bisection I have identified the culprit is in commit
91a8c2a5b43fc4be4adb4bda50cd331697e289e0 (x86/fpu: Clean up and fix MXCSR
handling). After applying that commit I start to get oopses like:

general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0-rc4-xen+ #18
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffffffff8180d4c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffff81048a6c>] [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40
RSP: 0000:ffffffff81803ba8 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 00000000ffff8800 RCX: 0000000000000000
RDX: 0020400000000000 RSI: 0000000000000000 RDI: ffffffff81803da8
RBP: ffffffff81803da8 R08: 7f0089011d402087 R09: 00000000ffff8800
R10: 7f0089011d402087 R11: 00000000ffff8800 R12: ffff88007f004000
R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000008
FS: 0000000000000000(0000) GS:ffff88007f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88000269a000 CR3: 0000000001808000 CR4: 00000000000006b0
Stack:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffff81048beb>] fpu__init_system+0x2b/0x190
[<ffffffff81048d5e>] fpu__cpu_init+0xe/0x10
[<ffffffff8104eac3>] cpu_init+0x3a3/0x430
[<ffffffff810794b1>] ? fill_pte+0x31/0x140
[<ffffffff8107975d>] ? set_pte_vaddr_pud+0x4d/0x60
[<ffffffff814058c0>] ? do_softirq_own_stack+0x30/0x30
[<ffffffff81881519>] trap_init+0x4bb/0x602
[<ffffffff8189d583>] ? inode_init_early+0x5a/0x98
[<ffffffff8187efea>] start_kernel+0x265/0x4e1
[<ffffffff8187eb7d>] ? set_init_arg+0x6a/0x6a
[<ffffffff8187e565>] x86_64_start_reservations+0x1b/0x32
[<ffffffff8187e6c3>] x86_64_start_kernel+0x147/0x156
[<ffffffff8187e120>] ? early_idt_handlers+0x120/0x120
Code: 05 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 90 55 31 c0 b9 40 00 00 00 48 89 e5 48 81 ec 00 02 00 00 48 8d bd 00 fe ff ff f3 48 ab <0f> ae 85 00 fe ff ff 8b 85 1c fe ff ff ba bf ff 00 00 85 c0 0f
RIP [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40
RSP <ffffffff81803ba8>

And indeed the oops happens at:
2b:* 0f ae 85 00 fe ff ff fxsave -0x200(%rbp) <--
trapping instruction

Because the address isn't 32-byte aligned (which I assume is the
requirement from looking into the code). So clearly my gcc messed up and
miscompiled the thing by ignoring the alignment attribute. Now my build
host is rather old installation (SLES 11 SP3) running gcc 4.3.4 but there
are quite a few installations of it still running. So do we care or should
I just upgrade the build host?

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/