Re: [HW PROBLEM] Intel I7 MCE. Erratum or not?

From: Hidetoshi Seto
Date: Mon Dec 08 2008 - 04:37:06 EST


Giangiacomo Mariotti wrote:
> I still don't quite understand the logic behind this exception. It
> happens always only once per boot, right after booting always at [
> 301.7320xx], which clearly means that it's always triggered by the
> same instruction/s. It's about a "Generic CACHE Level-2 Data-Write
> Error", yet after that moment it never happens again until the next
> boot at the same relative time. The cache has an hardware problem, the
> process context is corrupted, but still after that single message I
> don't have any problem, my system works normally, even under very high
> pressure on cpu and memory. Is this normal? Should I try to limit the
> number of cpu used to only 1(cpu0) on bios and disable hyperthreading?
> That way I'd have a single physical and logical cpu, so probably if it
> has an hardware problem on the cache, the heaven will fall?

IIRC, this error is not what happen on the time [301.7320xx] during
boot, but happen before the boot. Since the record says "Processor
context corrupt," MCE handler should call panic(or do something stop
the system) if the context actually corrupted during the boot.

In other words, it seems that 1) the error was recorded at last time
when your machine crashed unexpectedly(by cosmic-ray etc.) and not cleared
yet, or 2) your machine is doing something wrong in every reset/poweroff.

Could you try "mce=nobootlog" boot option?

Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/