Re: ERROR: Disabling IRQ #11

From: Maciej W. Rozycki
Date: Fri Oct 29 2004 - 10:43:40 EST


On Fri, 29 Oct 2004, Len Brown wrote:

> APIC error on CPU0: 00(01)
>
> Hmmm, how did we take this interrupt with no bits set?
> why do we have bit 0 (send checksum error) set after
> we try to clear "errors"?

Please have a look at the relevant local APIC specification. For the
P6-class local APIC a write of zero (or likely any value -- I don't
remember) to the ESR makes the internal error status be copied to the
externally visible ESR. A read of the ESR returns its contents and clears
it. Thus the report is perfectly valid -- it means no uncleared error was
left over before.

For the record: for the P5-class local APIC the ESR is the only error
status and it is also cleared on a read. Thus for that implementation the
error codes reported would be reversed. Writes to the ESR have no effect
by definition. Unfortunately, a range of chips have suffered from an
erratum which makes data on writes to the ESR being actually recorded in
the register. As a result, we cannot just do a sequence consisting of a
write and a read -- we need to do that leading read to handle buggy chips
correctly. And we do need to write zero specifically as otherwise a bogus
error would be reported for them.

I don't remember what the specification for the P4-class local APIC is in
this area, or what other vendors' implementations do. The i82489DX APIC
does not implement error reporting.

Frankly, I think this P5-to-P6 APIC specification change is an
unnecessary annoyance for an OS developer. And there are more caveats
like this across local APIC implementations, this perhaps being the least
harmful one.

Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/