Re: MCE Bug?

From: Luck, Tony
Date: Wed Jun 17 2015 - 13:45:59 EST

Next message: Jeff Moyer: "Re: [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON()"
Previous message: Djalal Harouni: "Re: [PATCH 0/3] kdbus: minor readability improvements"
In reply to: Borislav Petkov: "Re: MCE Bug?"
Next in thread: Luck, Tony: "RE: MCE Bug?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote:
> And I was waiting in line to get a chance to do some injection on our
> EINJ box here too. But it seems you have the required setup already so
> if you want to give those changes a run, I've uploaded them here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> It'll be much appreciated.

and the answer is <drum roll> ....

no. :-(

Here's the console log:

[ 0.190921] Initializing cgroup subsys blkio
[ 0.195701] Initializing cgroup subsys memory
[ 0.200621] Initializing cgroup subsys devices
[ 0.205591] Initializing cgroup subsys freezer
[ 0.210563] Initializing cgroup subsys net_cls
[ 0.215535] Initializing cgroup subsys perf_event
[ 0.220837] Initializing cgroup subsys hugetlb
[ 0.225884] CPU: Physical Processor ID: 0
[ 0.230368] CPU: Processor Core ID: 0
[ 0.235373] mce: CPU supports 22 MCE banks
[ 0.239995] CPU0: Thermal monitoring enabled (TM1)
[ 0.245388] process: using mwait in idle threads
[ 0.250554] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[ 0.257367] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[ 0.265023] Freeing SMP alternatives memory: 28K (ffffffff81cd9000 - ffffffff81ce0000)
[ 0.284681] ftrace: allocating 25590 entries in 100 pages
[ 0.301981] x2apic: IRQ remapping doesn't support X2APIC mode
[ 0.308540] Switched APIC routing to physical flat.
[ 0.314866] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.331616] smpboot: CPU0: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (fam: 06, model: 3f, stepping: 04)
[ 0.342176] Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver.
[ 0.353531] ... version: 3
[ 0.358009] ... bit width: 48
[ 0.362576] ... generic registers: 4
[ 0.367056] ... value mask: 0000ffffffffffff
[ 0.372992] ... max period: 0000ffffffffffff
[ 0.378926] ... fixed-purpose events: 3
[ 0.383404] ... event mask: 000000070000000f
[ 0.392352] x86: Booting SMP configuration:
[ 0.397027] .... node #0, CPUs: #1
[ 0.423364] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.432694] #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17
[ 0.706722] .... node #1, CPUs: #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35
[ 1.094522] .... node #2, CPUs: #36
[ 1.192483] mce: [Hardware Error]: Machine check events logged
[ 1.209188] #37
[ 1.209279] BUG: unable to handle kernel
[ 1.215830] #38
[ 1.217982] NULL pointer dereference at 0000000000000008
[ 1.223925] IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.225696] #39PGD 0
[ 1.233428] Oops: 0000 [#1] SMP
[ 1.237059] Modules linked in:
[ 1.240486] CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
[ 1.247969] #40
[ 1.247969] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
[ 1.261708] #41
[ 1.261711] task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
[ 1.272235] RIP: 0010:[<ffffffff810980a1>] [ 1.275305] #42
[<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.283854] RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
[ 1.289799] RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
[ 1.297780] #43
[ 1.297780] RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
[ 1.307914] RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
[ 1.315892] #44
[ 1.315892] R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
[ 1.326023] #45
[ 1.326024] R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
[ 1.336151] FS: 0000000000000000(0000) GS:ffff88181f400000(0000) knlGS:0000000000000000
[ 1.345207] #46
[ 1.345209] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.353782] CR2: 00000000000000b8 CR3: 00000000019ca000 CR4: 00000000001406e0
[ 1.361764] #47
[ 1.361765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.371900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.379879] #48
[ 1.379879] Stack:
[ 1.384277] 0000000000000282 00000000bce8578f ffff88181f416f50 ffff88181c2660c0[ 1.390883] #49

[ 1.394749] ffff88181f416ed8 ffff88181f416ec0 ffff88181c284470 ffff88181c2660f0
[ 1.403058] ffff88181bd97eb8 ffffffff81098982[ 1.407588] #50
ffff88181f4171a0 ffff88181f416ed8
[ 1.413538] Call Trace:
[ 1.416274] [<ffffffff81098982>] worker_thread+0x112/0x520
[ 1.422518] [<ffffffff81098870>] ? rescuer_thread+0x3e0/0x3e0
[ 1.429048] #51
[ 1.429048] [<ffffffff8109e6d8>] kthread+0xd8/0xf0
[ 1.436650] [<ffffffff8109e600>] ? kthread_create_on_node+0x1b0/0x1b0
[ 1.443964] #52
[ 1.443966] [<ffffffff816a0322>] ret_from_fork+0x42/0x70
[ 1.452164] [<ffffffff8109e600>] ? kthread_create_on_node+0x1b0/0x1b0
[ 1.459469] #53
[ 1.459471] Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c 8b 6f 48 48 89 [ 1.473959]
[ 1.473959] .... node #3, CPUs: #54
c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01
[ 1.489433] RIP [<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.496261] RSP <ffff88181bd97e08>
[ 1.500160] CR2: 0000000000000008
[ 1.503872] ---[ end trace 8229a011b97532a0 ]---
[ 1.509027] Kernel panic - not syncing: Fatal exception
[ 1.514890] ---[ end Kernel panic - not syncing: Fatal exception
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff Moyer: "Re: [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON()"
Previous message: Djalal Harouni: "Re: [PATCH 0/3] kdbus: minor readability improvements"
In reply to: Borislav Petkov: "Re: MCE Bug?"
Next in thread: Luck, Tony: "RE: MCE Bug?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]