Re: kernel panic at load average of 24 is it acceptable ?

From: Andrey Borzenkov
Date: Mon Jul 17 2006 - 14:14:34 EST


Vikas Kedia wrote:

>> Read up on MCE debugging methods on Linux or so, that should hopefully
>> help.
>
> Here is the output of mcelog:
> root@srv1:~# less /var/log/mcelog
> MCE 0
> CPU 0 0 data cache TSC 6988ae18046
> ADDR f87f5ec0
> Data cache ECC error (syndrome ce)
> bit46 = corrected ecc error
> bus error 'local node origin, request didn't time out
> data read mem transaction
> memory access, level generic'
> STATUS 9467400000000833 MCGSTATUS 0
> MCE 0
> CPU 0 0 data cache TSC 723b38a3633
> ADDR 3d9fc0
> Data cache ECC error (syndrome ce)
> bit46 = corrected ecc error
> bit62 = error overflow (multiple errors)
> bus error 'local node origin, request didn't time out
> data read mem transaction
> memory access, level generic'
> STATUS d467400000000833 MCGSTATUS 0
>
> Since it shows ECC error is the hypothesis correct that its the RAM
> problem and replacing it should solve the problem.
>

I am not sure if this is a question, but it shows _data cache_ multibit
error which makes it rather CPU not memory.

-andrey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/