Re: Dell XPS13: MCE (Hardware Error) reported

From: Paul Menzel
Date: Fri Jan 27 2017 - 08:37:47 EST


Dear Ashok,


On 01/09/17 20:23, Raj, Ashok wrote:

On Mon, Jan 09, 2017 at 12:53:33PM +0100, Paul Menzel wrote:

On 01/05/17 02:12, Raj, Ashok wrote:

CPUID Vendor Intel Family 6 Model 142
This is Kabylake Mobile

Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 7880018086 ADDR fef1ce40
TIME 1483543069 Wed Jan 4 16:17:49 2017

STATUS ee0000000040110a MCGSTATUS 0

Decoding the bits further from MCi_STATUS above:
Val=1, OVER=1, UC=1, but EN=0 indicates this isn't a MCE, hence should have
been signaled by a CMCI.

PCC=1, but should be ignored when EN=0.
MCACOD: 110a MSCOD: 0040

This MSCOD indicates that its a write back access to mmio space. Its possible
that BIOS is scanning certain memory region during boot. During which time
BIOS does disable generation of MCE's. Which is why EN=0 in the above log.

Its a BIOS bug, one would expect that BIOS clears up these before handoff to
OS. During OS boot we also scan all MC banks and log/clear them.

If you aren't observing them during normal operation you can safely ignore
these preboot logs, or pass them along to your OEM.

Thank you very much for your help. After wasting my time with the Dell support over Twitter [1], where they basically also make you jump through hoops, and then claim it’s an mcelog issue – as they apparently only execute `sudo mcelog` –, I updated to the latest firmware 1.3.2 released yesterday [2].

With that new firmware version, it looks like that the firmware has been fixed and Linux does not report any MCEs.

It’d be great if other Dell XPS13 9360 users could verify that.


Kind regards,

Paul


[1] https://twitter.com/pmenzel_molgen/status/818808708692115456
[2] XPS_9360_1.3.2.exe