[4.9-rc5] kernel BUG at kernel/sched/rt.c:764!

From: CAI Qian
Date: Wed Nov 16 2016 - 17:58:03 EST


Occasionally, this machine hit it during boot with this config.

http://people.redhat.com/qcai/tmp/config-god-4.9rc2

[ 18.125103] x2apic enabled
[ 18.128182] Switched APIC routing to cluster x2apic.
[ 18.137063] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 18.153805] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (family: 0x6, model: 0x4f, stepping: 0x1)
[ 18.165021] Performance Events: PEBS fmt2+, Broadwell events, 16-deep LBR, full-width counters, Intel PMU driver.
[ 18.176595] ... version: 3
[ 18.181074] ... bit width: 48
[ 18.185647] ... generic registers: 4
[ 18.190124] ... value mask: 0000ffffffffffff
[ 18.196055] ... max period: 0000ffffffffffff
[ 18.201986] ... fixed-purpose events: 3
[ 18.206453] ... event mask: 000000070000000f
[ 20.648609] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 20.702972] x86: Booting SMP configuration:
[ 20.712720] .... node #0, CPUs: #1[ 20.847790] #2[ 20.974935] #3[ 21.109503] #4[ 21.245976] #5[ 21.383743] #6[ 21.554680] #7[ 21.703806] #8[ 21.864885] #9[ 22.018063] #10[ 22.154530] #11[ 22.345902] #12[ 22.523560] #13[ 22.661047] #14[ 22.821751] #15[ 22.999171] #16[ 23.142056] #17[ 23.315885] #18[ 23.450304] #19[ 23.642422] #20[ 23.816793] #21[ 23.955838] .... node #1, CPUs: #22[ 24.166606] #23[ 24.340859] #24[ 24.501884] #25[ 24.679949] #26[ 24.839650] #27[ 25.014436] #28[ 25.179093] #29[ 25.319094] #30[ 25.482463] #31[ 25.636126] #32[ 25.820521] #33[ 25.983310] #34[ 26.162576] #35[ 26.326769] #36[ 26.508100] #37[ 26.672296] #38[ 26.847331] #39[ 27.011591] #40[ 27.154124] #41[ 27.314263] #42[ 27.472972] #43[ 27.661509] .... node #0, CPUs: #44[ 27.827697] #45[ 28.009301] #46[ 28.173231] #47[ 28.353749] #48[ 28.517099] #49[ 28.686097] #50[ 28.850425] #51[ 28.928408] #52[ 29.006194] #53[ 29.084035] #54[ 29.161891] #55[ 29.239825] #56[ 29.317658] #57[ 29.395585] #58[ 29.473428] #59[ 29.551326] #60[ 29.629383] #61[ 29.707235] #62[ 29.785018] #63[ 29.862918] #64[ 29.940800] #65[ 30.018569] .... node #1, CPUs: #66[ 30.098050] #67[ 30.175751] #68[ 30.253356] #69[ 30.331126] #70[ 30.408855] #71[ 30.486657] #72[ 30.564568] #73[ 30.642370] #74[ 30.720135] #75[ 30.798071] #76[ 30.875768] #77[ 30.953472] #78[ 31.031228] #79[ 31.109028] #80[ 31.186751] #81[ 31.264504] #82[ 31.342254] #83[ 31.420027] #84[ 31.497807] #85[ 31.575565] #86[ 31.653323] #87[ 31.720946] x86: Booted up 2 nodes, 88 CPUs
[ 31.725672] ----------------
[ 31.728884] | NMI testsuite:
[ 31.732102] --------------------
[ 31.735706] remote IPI: ok |
[ 31.749619] local IPI: ok |
[ 31.765645] --------------------
[ 31.769257] Good, all 2 testcases passed! |
[ 31.774148] ---------------------------------
[ 31.779019] smpboot: Total of 88 processors activated (391240.60 BogoMIPS)
[ 32.277215] perf: interrupt took too long (7702 > 6366), lowering kernel.perf_event_max_sample_rate to 25000
[ 32.277237] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.174 msecs
[ 32.316901] ------------[ cut here ]------------
[ 32.322058] kernel BUG at kernel/sched/rt.c:764!
[ 32.327210] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[ 32.334593] Modules linked in:
[ 32.338013] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc5+ #2
[ 32.345008] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ 32.356367] task: ffff880e3f278000 task.stack: ffff880848280000
[ 32.362973] RIP: 0010:[<ffffffff812ff2b5>] [<ffffffff812ff2b5>] rq_offline_rt+0x6b5/0xda0
[ 32.372217] RSP: 0000:ffff8808482878a8 EFLAGS: 00010082
[ 32.378144] RAX: 0000000000000007 RBX: fffffffffd050f80 RCX: 1ffffffff0675688
[ 32.386109] RDX: 0000000000000000 RSI: ffffffff833ab440 RDI: ffff880e3f278cd4
[ 32.394073] RBP: ffff880848287958 R08: 0000000000000000 R09: ffff880e575e2950
[ 32.402037] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000058
[ 32.410001] R13: ffffffff857390c4 R14: ffffffff85f60100 R15: dffffc0000000000
[ 32.417965] FS: 0000000000000000(0000) GS:ffff88085a600000(0000) knlGS:0000000000000000
[ 32.426997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 32.433409] CR2: ffff881077fff000 CR3: 0000000003610000 CR4: 00000000003406f0
[ 32.441373] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 32.449337] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 32.457301] Stack:
[ 32.459544] ffffffff85f607f8 ffffffff85f60210 ffff880e575e2a40 ffff880e575e2aa0
[ 32.467842] ffffffff85f60100 ffff880e575e2930 ffffffff85f60218 ffff880e575e2040
[ 32.476140] ffff880e575e2938 ffff880e575e2040 ffff880e575e2aa0 ffff880e575e2040
[ 32.484436] Call Trace:
[ 32.487169] [<ffffffff81283331>] set_rq_offline+0xa1/0x290
[ 32.493391] [<ffffffff81285dee>] rq_attach_root+0x2fe/0x6d0
[ 32.499709] [<ffffffff8128668a>] cpu_attach_domain+0x4ca/0x1be0
[ 32.506413] [<ffffffff812861c0>] ? rq_attach_root+0x6d0/0x6d0
[ 32.512925] [<ffffffff81df6a97>] ? debug_smp_processor_id+0x17/0x20
[ 32.520020] [<ffffffff813abc25>] ? rcu_is_watching+0x15/0x130
[ 32.526532] [<ffffffff812a8f56>] build_sched_domains+0x2e86/0x5340
[ 32.533531] [<ffffffff812a60d0>] ? build_sched_domain+0x1620/0x1620
[ 32.540624] [<ffffffff81802c65>] ? __kmalloc_node+0x175/0x400
[ 32.547136] [<ffffffff81d4b411>] ? alloc_cpumask_var_node+0x51/0x100
[ 32.554318] [<ffffffff81d4b4ee>] ? alloc_cpumask_var+0xe/0x10
[ 32.560831] [<ffffffff85984cf1>] sched_init_smp+0xbf3/0xdb0
[ 32.567149] [<ffffffff859840fe>] ? trace_event_define_fields_sched_process_template+0x91/0x91
[ 32.576766] [<ffffffff8592bf91>] kernel_init_freeable+0x402/0x757
[ 32.583668] [<ffffffff813413a0>] ? trace_hardirqs_on_caller+0x520/0x720
[ 32.591148] [<ffffffff8592bb8f>] ? start_kernel+0x772/0x772
[ 32.597466] [<ffffffff8127e62e>] ? preempt_count_sub+0x5e/0xe0
[ 32.604076] [<ffffffff81076eb0>] ? compat_start_thread+0xa0/0xa0
[ 32.610881] [<ffffffff82c9ad20>] ? rest_init+0x190/0x190
[ 32.616906] [<ffffffff82c9ad33>] kernel_init+0x13/0x140
[ 32.622835] [<ffffffff82c9ad20>] ? rest_init+0x190/0x190
[ 32.628863] [<ffffffff82cb5957>] ret_from_fork+0x27/0x40
[ 32.634889] Code: e8 b1 5a 9b 01 48 89 df e8 79 e9 ff ff e9 aa fa ff ff 48 8b 7d 90 4c 8b b5 70 ff ff ff e8 64 55 9b 01 48 85 db 0f 84 6d ff ff ff <0f> 0b 48 c7 c7 e0 e3 a2 83 e8 63 2b b7 00 4c 8b b5 70 ff ff ff
[ 32.656589] RIP [<ffffffff812ff2b5>] rq_offline_rt+0x6b5/0xda0
[ 32.663207] RSP <ffff8808482878a8>
[ 32.667148] ---[ end trace 331ae5ac79a63af2 ]---
[ 32.672301] Kernel panic - not syncing: Fatal exception
[ 33.752196] Shutting down cpus with NMI
[ 33.756508] ---[ end Kernel panic - not syncing: Fatal exception