On Tue, Aug 05, 2003 at 12:45:01PM +1200, Simon Garner wrote:
> Andi Kleen <ak@muc.de> wrote:
>
> > There is nothing in any of my trees that generates such a message.
> > If it was GART related it would be either "GART TLB error ..." or
> > "extended error gart error". But even that should not happen anymore,
> > see below.
> >
> > I don't know what the RedHat kernel does, they may have changed the
> > MCE handler over the reference port.
> >
>
> A quick google brings up this reference:
> http://www.iglu.org.il/lxr/source/arch/x86_64/kernel/bluesmoke.c
Ok that's the very old MCE code that incorrectly enabled the northbridge
machine check. Don't use that or use mce=off. However I still think
it's a driver bug in your case. If it was the shakey GART MCE itself
you would get a panic because it's a unrecoverable MCE. More
likely the driver is accessing PCI DMA mappings after they got unmapped,
which is a serious bug, but somehow not serious enough that the
northbridge triggers the MCE.
I was confused by your statement that the SuSE 8.2 beta9 kernel
generated that. It didn't because it doesn't contain that old code.
What does a modern kernel like the SuSE one or a x86-64.org kernel
generate exactly?
>
> The error appears to be generated by the code starting around line 152
> in that file.
>
> Btw, what is 'bluesmoke'?
Alan Cox's sense of humour. Look it up in the jargon file.
> > You can always disable it with mce=off or better mce=0
> > as the message seems to be caused by the periodic non fatal MCE check
> > timer.
> >
>
> What will I lose by disabling this?
mce=0 turns off periodic MCE checking for non fatal errors.
That's not a big issue, the worst you lose is reporting of one bit
corrected ECC memory failures.
mce=off turns off MCE reporting for fatal MCE exceptions (however
your box may still crash when something really bad happens)
mce=0 should have turned off the periodic check and your
message very much looks like a periodic one, as actual MCE
exceptions report more data. I'm a bit puzzled why it doesn't
kill the message here. You can try mce=off, but I'm not
sure it will help neither.
Using a newer kernel is probably a good idea anyways, as there
were many bugfixes since then.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Aug 07 2003 - 22:00:28 EST