V 3.16.51 will not boot
From: Da Shi Cao
Date: Thu Dec 14 2017 - 19:29:19 EST
The latest version of 3.16 will not boot on my box of 4 sockets, 32 cores.
[ 1.952997] general protection fault: 0000 [#1] SMP
[ 1.957992] Modules linked in:
[ 1.961064] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W
3.16.51-ds02-g148f3e4-dirty #2
[ 1.969834] Hardware name: IBM System x3850 X5 -[7143X1U]-/Node 1,
Processor Card, BIOS -[G0E185AUS-1.85]- 04/22/2015
[ 1.980422] task: ffff8820731417e0 ti: ffff882073158000 task.ti:
ffff882073158000
[ 1.987894] RIP: 0010:[<ffffffff8106ab02>] [<ffffffff8106ab02>]
build_sched_domains+0x6e2/0xbf0
[ 1.996684] RSP: 0000:ffff88207315bdf0 EFLAGS: 00010206
[ 2.001987] RAX: 0000ffff00000000 RBX: 0000000000000000 RCX: 0000000000000008
[ 2.009112] RDX: 0000000000014918 RSI: 0000000000000000 RDI: 0000000000000080
[ 2.016237] RBP: ffff88207315bea0 R08: ffff882072e88ca0 R09: 000000000000fffe
[ 2.023362] R10: 000000002469f94a R11: 0000000000000000 R12: ffff882072e1db58
[ 2.030488] R13: ffff882072e88c80 R14: ffff8880724c1488 R15: 0000000000000080
[ 2.037613] FS: 0000000000000000(0000) GS:ffff88207fc00000(0000)
knlGS:0000000000000000
[ 2.045690] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2.051428] CR2: ffff88807ffff000 CR3: 0000000001a11000 CR4: 00000000000007f0
[ 2.058553] Stack:
[ 2.060567] ffffffff00000000 0000000000000000 0000000000000000
000000000000cd28
[ 2.068028] 0000000000000000 0000000000000000 0000000000000000
ffff882072d60320
[ 2.075490] 0000000000000000 0000000000000000 ffff8880724c1488
ffff882072e1dac0
[ 2.082952] Call Trace:
[ 2.085400] [<ffffffff81ac0847>] sched_init_smp+0x38f/0x41a
[ 2.091055] [<ffffffff81ab7be4>] ? native_smp_cpus_done+0x10b/0x112
[ 2.097400] [<ffffffff81aaaf94>] kernel_init_freeable+0xf4/0x200
[ 2.103485] [<ffffffff81aaaf94>] ? kernel_init_freeable+0xf4/0x200
I drill down to the function "build_group_mask"
@@ -5801,7 +5801,7 @@ build_group_mask(struct sched_domain *sd, struct
sched_group *sg, struct cpumask
continue;
/* If we would not end up here, we can't continue from here */
- if (!cpumask_equal(span, sched_domain_span(sibling->child)))
+ if (!cpumask_subset(sched_domain_span(sibling->child), span))
continue;
cpumask_set_cpu(i, mask);
This is the best guess I can make and the change makes it boot up on my box.