Re: mcelog ?

From: Stephan von Krawczynski
Date: Tue May 16 2006 - 05:36:40 EST


On Mon, 15 May 2006 08:20:08 -0700
thockin@xxxxxxxxxx wrote:

> On Mon, May 15, 2006 at 11:42:43AM +0200, Stephan von Krawczynski wrote:
> > HARDWARE ERROR
> > CPU 1: Machine Check Exception: 4 Bank 4: b60a200170080813
> > TSC 89cfb4725b17 ADDR 1025cb3f0
> > This is not a software problem!
> > Run through mcelog --ascii to decode and contact your hardware vendor
> > Kernel panic - not syncing: Machine check
> >
> > Of course I ran mcelog but I don't quite understand how the additional info
> > helps me finding the problem.
> > Is this a problem with RAM? And if, which one?
>
> It sounds like a memory error, but there are some other bank4 errors that
> can crop up. What does mcedecode say?

Well, here it is:

HARDWARE ERROR
CPU 1 4 northbridge TSC 89cfb4725b17
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 7014
bit32 = err cpu0
bit45 = uncorrected ecc error
bit57 = processor context corrupt
bit61 = error uncorrected
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS b60a200170080813 MCGSTATUS 4
This is not a software problem!


Is this some sort of mem error?

Thank you for your help
--
Regards,
Stephan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/