Re: SMP bugs on 2.3.99-pre3

From: Andrew Morton (andrewm@uow.edu.au)
Date: Wed Apr 05 2000 - 21:01:40 EST


Brad Borgald wrote:
>
> On Tue, Apr 04, 2000 at 08:42:55AM -0400, Stephen Frost wrote:
> > On Tue, 4 Apr 2000, Janek Hiis wrote:
> >
> > > I have dual celeron machine and when running 2.3.99-pre3
> > > i get these errors
> > >
> > > APIC error interrupt on CPU#0, should never happen.
> > > ... APIC ESR0: 00000000
> > > ... APIC ESR1: 00000002
> > > ... bit 1: APIC Receive CS Error (hw problem).
> > > APIC error interrupt on CPU#1, should never happen.
> > > ... APIC ESR0: 00000002
> > > ... APIC ESR1: 0000000a
> > > ... bit 1: APIC Receive CS Error (hw problem).
> > > ... bit 3: APIC Receive Accept Error.
> > >
> > > kind of randomly ...
> > > but with 2.2.14 no errors
> >
> > You may have had that happening under 2.2.14, but it
> > wasn't complained about until some point in 2.3.x. In other
> > words, the printk didn't exist.
>
> So is this a serious message or can it be ignored? The system
> still seems to run fine in dual under both linux and windows 2000.

According to "Maciej W. Rozycki" <macro@ds2.pg.gda.pl> (who knows about these things) on the linux-smp list:

 Actually it's APIC that handles the error. If any transmission over the
inter-APIC bus fails it's repeated (there are some exceptions, but they do
not apply to CheckSum errors) until succeeded. An error interrupt happens
as a side effect and can be enabled or disabled by an OS depending on
whether diagnostics is desired or not. Linux 2.2 keeps this diagnostics
disabled as it's unsupported.

 In other words, the kernel need not handle APIC errors at all -- they are
just reported FYI.

 An accept error above is surely the result of the reported checksum
error.

-- 
-akpm-

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:15 EST