Re: GART error 11 (fwd)

From: Andi Kleen
Date: Thu May 27 2004 - 10:27:57 EST


Arthur Perry <kernel@xxxxxxxxxxxxxx> writes:

> Here is a posting that I dropped off in RedHat's amd64-list.
> It is a kernel related issue, so if anybody has any insight or opinion of
> proper implementation here, please jump in!

Machine Check Exceptions are in front of all hardware issues, not kernel
issues. It is your CPU trying to tell you that something is wrong in the
hardware.

The 2.4 MCE code tends to label unrelated MCEs as "GART error" because
of bugs in the MCE decoding functions. There is a full fix for that
in the works.

In some early 2.4 kernels it also managed to trigger a CPU bug
by writing directly nb registers. This should be fixed in later
2.4 kernels and also in SuSE SLES8-SP3.

Best alternative is to use 2.6 which has much improved MCE handling.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/