Re: [Patch V2] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.
From: Borislav Petkov
Date: Mon Dec 07 2015 - 17:34:48 EST
On Mon, Dec 07, 2015 at 10:07:59PM +0000, Luck, Tony wrote:
> > And that is incorrect too, because the MCE (at least the one I'm
> > injecting) gets broadcasted to the CPUs on the *node* and not to the
> > whole system.
>
> Which system? What kind of machine check? On Intel we expect machine checks
> to be broadcast to all logical cpus on all nodes (unless local machine check is enabled,
> in which case SRAR style machine checks go only to the logical cpu that hit the error).
>
> The code is written to that expectation ... and we don't report things as well if
> something else happens (like too many or too few cpus showing up).
Box logs below.
BIOS is doing funny cores enumeration:
node #0, CPUs 0-7
node #1, CPUs 8-15
node #2, CPUs 16-23
node #3, CPUs 24-31
and then starts from node 0 again:
.... node #0, CPUs: #32 #33 #34 #35 #36 #37 #38 #39
.... node #1, CPUs: #40 #41 #42 #43 #44 #45 #46 #47
.... node #2, CPUs: #48 #49 #50 #51 #52 #53 #54 #55
.... node #3, CPUs: #56 #57 #58 #59 #60 #61 #62 #63
So I went and offlined cores 5 and 34 which are on node 0.
Why node 0? Well, when I inject error type 0x10 which is
0x00000010 Memory Uncorrectable non-fatal
it generates an MCE only on the node 0 cores. For that log see the end
of this mail. The gist of it is that the CPUs on which #MC gets raised
are the cores on node 0, i.e., 0-7 and 32-39.
Cores 5 and 34 are gone, of course.
I mean, even if the #MC gets raised only on the node, the fix still
works.
$ grep -Ei "hardware.*CPU" /tmp/mce | sed 's/^.*CPU//' | sort -n
0: Machine Check Exception: 5 Bank 5: be00000000010090
1: Machine Check Exception: 5 Bank 5: be00000000010090
2: Machine Check Exception: 5 Bank 5: be00000000010090
3: Machine Check Exception: 5 Bank 5: be00000000010090
4: Machine Check Exception: 5 Bank 5: be00000000010090
6: Machine Check Exception: 5 Bank 5: be00000000010090
7: Machine Check Exception: 5 Bank 5: be00000000010090
32: Machine Check Exception: 5 Bank 5: be00000000010090
33: Machine Check Exception: 5 Bank 5: be00000000010090
35: Machine Check Exception: 5 Bank 5: be00000000010090
36: Machine Check Exception: 5 Bank 5: be00000000010090
37: Machine Check Exception: 5 Bank 5: be00000000010090
38: Machine Check Exception: 5 Bank 5: be00000000010090
39: Machine Check Exception: 5 Bank 5: be00000000010090
[ 0.859060] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz (family: 0x6, model: 0x2d, stepping: 0x7
...
[ 0.981593] x86: Booting SMP configuration:
[ 0.991092] .... node #0, CPUs: #1
[ 1.013485] microcode: CPU1 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.034219] #2
[ 1.049577] microcode: CPU2 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.070309] #3
[ 1.085865] microcode: CPU3 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.106618] #4
[ 1.121978] microcode: CPU4 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.142720] #5
[ 1.158079] microcode: CPU5 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.178833] #6
[ 1.194191] microcode: CPU6 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.214914] #7
[ 1.230471] microcode: CPU7 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.251309]
[ 1.254854] .... node #1, CPUs: #8
[ 1.275173] microcode: CPU8 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.390509] #9
[ 1.406859] microcode: CPU9 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.427735] #10
[ 1.444303] microcode: CPU10 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.465343] #11
[ 1.481718] microcode: CPU11 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.502779] #12
[ 1.519156] microcode: CPU12 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.540171] #13
[ 1.556536] microcode: CPU13 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.577587] #14
[ 1.594127] microcode: CPU14 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.615131] #15
[ 1.631471] microcode: CPU15 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.652590]
[ 1.656132] .... node #2, CPUs: #16
[ 1.676518] microcode: CPU16 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.791812] #17
[ 1.808189] microcode: CPU17 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.829292] #18
[ 1.845868] microcode: CPU18 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.866925] #19
[ 1.883311] microcode: CPU19 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.904386] #20
[ 1.920765] microcode: CPU20 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.941810] #21
[ 1.958169] microcode: CPU21 microcode updated early to revision 0x710, date = 2013-06-17
[ 1.979242] #22
[ 1.995787] microcode: CPU22 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.016842] #23
[ 2.033182] microcode: CPU23 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.054314]
[ 2.057854] .... node #3, CPUs: #24
[ 2.078330] microcode: CPU24 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.193513] #25
[ 2.209874] microcode: CPU25 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.230996] #26
[ 2.247563] microcode: CPU26 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.268627] #27
[ 2.284998] microcode: CPU27 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.306061] #28
[ 2.322437] microcode: CPU28 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.343433] #29
[ 2.359780] microcode: CPU29 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.380855] #30
[ 2.397397] microcode: CPU30 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.418432] #31
[ 2.434759] microcode: CPU31 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.455792]
[ 2.459336] .... node #0, CPUs: #32 #33 #34 #35 #36 #37 #38 #39
[ 2.583817] .... node #1, CPUs: #40 #41 #42 #43 #44 #45 #46 #47
[ 2.710873] .... node #2, CPUs: #48 #49 #50 #51 #52 #53 #54 #55
[ 2.838069] .... node #3, CPUs: #56 #57 #58 #59 #60 #61 #62 #63
[ 2.964288] x86: Booted up 4 nodes, 64 CPUs
[ 2.974471] smpboot: Total of 64 processors activated (344907.86 BogoMIPS)
[ 5290.635126] Broke affinity for irq 82
[ 5290.643222] Broke affinity for irq 111
[ 5290.651507] Broke affinity for irq 125
[ 5290.664107] smpboot: CPU 5 is now offline
[ 5298.371336] Broke affinity for irq 31
[ 5298.379528] Broke affinity for irq 82
[ 5298.387627] Broke affinity for irq 103
[ 5298.395908] Broke affinity for irq 110
[ 5298.404187] Broke affinity for irq 111
[ 5298.412450] Broke affinity for irq 112
[ 5298.420733] Broke affinity for irq 118
[ 5298.429017] Broke affinity for irq 124
[ 5298.437295] Broke affinity for irq 125
[ 5298.445584] Broke affinity for irq 127
[ 5298.453880] Broke affinity for irq 137
[ 5298.466543] smpboot: CPU 34 is now offline
[ 5302.187338] EINJ: Error INJection is initialized.
[ 5318.897170] Disabling lock debugging due to kernel taint
[ 5318.910775] mce: [Hardware Error]: CPU 37: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5318.931171] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5318.951567] mce: [Hardware Error]: TSC bab9f2d8a4e00 ADDR bb68ec00 MISC 20403ebe86
[ 5318.969835] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC b microcode 710
[ 5318.990959] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5319.003825] EDAC sbridge MC0: CPU 37: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.023215] EDAC sbridge MC0: TSC bab9f2d8a4e00
[ 5319.033036] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5319.050338] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC b
[ 5319.069542] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset
:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5319.122943] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.143355] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5319.163846] mce: [Hardware Error]: TSC bab9f2d8a51c1 ADDR bb68ec00 MISC 20403ebe86
[ 5319.182249] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 6 microcode 710
[ 5319.203539] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5319.216586] EDAC sbridge MC0: CPU 3: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.235994] EDAC sbridge MC0: TSC bab9f2d8a51c1
[ 5319.245814] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5319.263348] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 6
[ 5319.283041] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset
:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5319.337311] mce: [Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.357960] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8159a4d0> {mutex_lock+0x10/0x27}
[ 5319.378519] mce: [Hardware Error]: TSC bab9f2d8a3feb ADDR bb68ec00 MISC 20403ebe86
[ 5319.397151] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 4 microcode 710
[ 5319.418650] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5319.431902] EDAC sbridge MC0: CPU 2: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.451491] EDAC sbridge MC0: TSC bab9f2d8a3feb
[ 5319.461311] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5319.479022] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 4
[ 5319.499014] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset
:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5319.553209] mce: [Hardware Error]: CPU 6: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.574029] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5319.594953] mce: [Hardware Error]: TSC bab9f2d8a87ea ADDR bb68ec00 MISC 20403ebe86
[ 5319.613756] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC c microcode 710
[ 5319.635431] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5319.648873] EDAC sbridge MC0: CPU 6: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.668661] EDAC sbridge MC0: TSC bab9f2d8a87ea
[ 5319.678483] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5319.696422] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC c
[ 5319.716789] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset
:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5319.771531] mce: [Hardware Error]: CPU 38: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.792743] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5319.813836] mce: [Hardware Error]: TSC bab9f2d8a87ce ADDR bb68ec00 MISC 20403ebe86
[ 5319.832819] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC d microcode 710
[ 5319.854654] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5319.868243] EDAC sbridge MC0: CPU 38: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5319.888366] EDAC sbridge MC0: TSC bab9f2d8a87ce
[ 5319.898186] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5319.916192] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC d
[ 5319.936752] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5319.991752] mce: [Hardware Error]: CPU 35: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.013034] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5320.034166] mce: [Hardware Error]: TSC bab9f2d8a59dd ADDR bb68ec00 MISC 20403ebe86
[ 5320.053149] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 7 microcode 710
[ 5320.074972] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5320.088567] EDAC sbridge MC0: CPU 35: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.108688] EDAC sbridge MC0: TSC bab9f2d8a59dd
[ 5320.118511] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5320.136527] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 7
[ 5320.157079] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5320.212025] mce: [Hardware Error]: CPU 39: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.233316] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5320.254462] mce: [Hardware Error]: TSC bab9f2d8a4f5c ADDR bb68ec00 MISC 20403ebe86
[ 5320.273455] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC f microcode 710
[ 5320.295303] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5320.308905] EDAC sbridge MC0: CPU 39: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.329026] EDAC sbridge MC0: TSC bab9f2d8a4f5c
[ 5320.338847] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5320.356858] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC f
[ 5320.377433] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5320.432474] mce: [Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.453569] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5320.474703] mce: [Hardware Error]: TSC bab9f2d8a4d60 ADDR bb68ec00 MISC 20403ebe86
[ 5320.493689] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC e microcode 710
[ 5320.515532] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5320.529139] EDAC sbridge MC0: CPU 7: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.549050] EDAC sbridge MC0: TSC bab9f2d8a4d60
[ 5320.558870] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5320.576890] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC e
[ 5320.597478] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5320.652525] mce: [Hardware Error]: CPU 36: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.673804] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5320.694918] mce: [Hardware Error]: TSC bab9f2d8a5823 ADDR bb68ec00 MISC 20403ebe86
[ 5320.713916] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 9 microcode 710
[ 5320.735759] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5320.749347] EDAC sbridge MC0: CPU 36: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.769452] EDAC sbridge MC0: TSC bab9f2d8a5823
[ 5320.779273] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5320.797296] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 9
[ 5320.817877] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5320.872972] mce: [Hardware Error]: CPU 33: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.894249] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5320.915390] mce: [Hardware Error]: TSC bab9f2d8a5326 ADDR bb68ec00 MISC 20403ebe86
[ 5320.934374] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 3 microcode 710
[ 5320.956222] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5320.969807] EDAC sbridge MC0: CPU 33: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5320.989913] EDAC sbridge MC0: TSC bab9f2d8a5326
[ 5320.999734] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5321.017750] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 3
[ 5321.038284] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5321.093686] mce: [Hardware Error]: CPU 1: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.114770] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5321.135925] mce: [Hardware Error]: TSC bab9f2d8a5562 ADDR bb68ec00 MISC 20403ebe86
[ 5321.154918] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 2 microcode 710
[ 5321.176765] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5321.190369] EDAC sbridge MC0: CPU 1: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.210303] EDAC sbridge MC0: TSC bab9f2d8a5562
[ 5321.220123] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5321.238146] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 2
[ 5321.258723] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5321.303358] mce: [Hardware Error]: CPU 4: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.324279] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5321.345397] mce: [Hardware Error]: TSC bab9f2d8a572f ADDR bb68ec00 MISC 20403ebe86
[ 5321.364380] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 8 microcode 710
[ 5321.386184] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5321.399729] EDAC sbridge MC0: CPU 4: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.419624] EDAC sbridge MC0: TSC bab9f2d8a572f
[ 5321.429445] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5321.447454] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 8
[ 5321.467989] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5321.511475] mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.532587] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5321.553689] mce: [Hardware Error]: TSC bab9f2d8a50f4 ADDR bb68ec00 MISC 20403ebe86
[ 5321.572681] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 1 microcode 710
[ 5321.594500] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5321.608057] EDAC sbridge MC0: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.628161] EDAC sbridge MC0: TSC bab9f2d8a50f4
[ 5321.637982] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5321.655998] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 1
[ 5321.676524] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5321.720020] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.740939] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 5321.762058] mce: [Hardware Error]: TSC bab9f2d8a5034 ADDR bb68ec00 MISC 20403ebe86
[ 5321.781022] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 0 microcode 710
[ 5321.802837] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 5321.816395] EDAC sbridge MC0: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010090
[ 5321.836300] EDAC sbridge MC0: TSC bab9f2d8a5034
[ 5321.846121] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86
[ 5321.864127] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 0
[ 5321.884647] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
[ 5321.928136] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 5321.945589] Kernel panic - not syncing: Fatal machine check
[ 5321.985122] Kernel Offset: disabled
[ 5322.008492] Rebooting in 100 seconds..
[ 5421.226077] ACPI MEMORY or I/O RESET_REG.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/