Re: Machine check expection panic

From: kwijibo
Date: Sun Aug 10 2003 - 03:19:00 EST



Andi Kleen wrote:

Dave Jones <davej@xxxxxxxxxx> writes:


#
diff -Nru a/arch/i386/kernel/cpu/mcheck/k7.c b/arch/i386/kernel/cpu/mcheck/k7.c
--- a/arch/i386/kernel/cpu/mcheck/k7.c Wed Aug 6 23:33:40 2003
+++ b/arch/i386/kernel/cpu/mcheck/k7.c Wed Aug 6 23:33:40 2003
@@ -81,7 +81,7 @@
wrmsr (MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
nr_mce_banks = l & 0xff;

- for (i=0; i<nr_mce_banks; i++) {
+ for (i=1; i<nr_mce_banks; i++) {



The change looks rather suspicious to me.

Bank 0 is the data cache unit (DC)

Do you have an errata that says that the DC bank is bad on all Athlons?

Normally BIOS or microcode are supposed to turn off bad MCEs by masking them in another register. Maybe the person's CPU has a real problem that is just masked now, e.g. it could be overclocked
and stress the cache too much.

The CPU's aren't overclocked and have worked fine for
me under much heavier loads than booting a kernel for
at least a year. Using the 2.4 kernel that is. Once
I remove the exception code from the kernel it boots
fine and runs fine under any load I put it under.


The original MCE was:

Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(0): f606200000000833 @ 4040
External tag parity error
Uncorrectable ECC error
CPU state corrupt. Restart not possible
Address in addr register valid
Error enabled in control register
Error not corrected.
Error overflow
Bus and interconnect error
Participation: Local processor originated request
Timeout: Request did not timeout
Request: Generic error
Transaction type : Instruction
Memory/IO : Other

Tyan 2466 motherboard
2 Athon MP 1200 processors (1200?)

Should say 1.2 GHz processor I imagine. AMD and their
wacky naming schemes. This is before they had they
wacky number scheme.

Steve



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/