Re: MCE error

From: Dave Jones (davej@codemonkey.org.uk)
Date: Mon Mar 31 2003 - 10:39:39 EST


On Sun, Mar 30, 2003 at 10:47:56AM -0800, Lawrence Walton wrote:
> I just got a MCE error while running 2.5.65 "MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
> Bank 2: 940040000000017a" did a google search and found Dave Jones's parsemce, and decoded it to
>
> Status: (ba) Error IP valid
> Restart IP invalid.
>
> And was wondering what that actually meant. :)

Incomplete dump, what it really means..

(davej@deviant:davej)$ ./a.out -b 2 -e 0xba -s 940040000000017a -a 0
Status: (186) Error IP valid
Restart IP invalid.
parsebank(2): 940040000000017a @ 0
        External tag parity error
        Correctable ECC error
        Address in addr register valid
        Error enabled in control register
        Memory heirarchy error
        Request: Generic error
        Transaction type : Generic
        Memory/IO : I/O

Looks like the L2 cache ECC checking spotted something going wrong,
and fixed it up. This can happen in cases where there is inadequate
cooling, power, or overclocking (or in rare circumstances, flaky CPUs)

                Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Mar 31 2003 - 22:00:36 EST