Re: Machine check expection panic

From: Dave Jones (davej@redhat.com)
Date: Wed Aug 06 2003 - 20:34:34 EST


On Thu, Aug 07, 2003 at 03:00:14AM +0200, Andi Kleen wrote:

> The change looks rather suspicious to me.

It's been in 2.4 for months, it solved the same problem there as
many people are now seeing in 2.6. The "I don't get MCEs in 2.4
but I get them in 2.6" reports are numerous, and I don't buy the
"2.6 stresses hardware more" theory for a second.
 
> Bank 0 is the data cache unit (DC)
> Do you have an errata that says that the DC bank is bad on all Athlons?

Hmm, I thought this was actually documented, but I can't seem to find
it in any of the docs I have. There are however gaps between the
errata numbers in a few cases, so its possible it was removed in
a later version of the revision guide. Richard ?
 
> Normally BIOS or microcode are supposed to turn off bad MCEs by
> masking them in another register. Maybe the person's CPU has a
> real problem that is just masked now, e.g. it could be overclocked
> and stress the cache too much.

I recall seeing Athlon owners complain when I 'fixed' this problem
using an inverse of this patch in 2.4.19-pre3. For pre4, Marcelo
backed it out, and people were happy again.

Whether its documented or not, there are boxes out there that don't
like having that bank enabled.

                Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 07 2003 - 22:00:36 EST