Re: 2.6.27-rc3: 'APIC error on CPU1: 00(40)', but only on resume!

From: Vegard Nossum
Date: Thu Aug 21 2008 - 07:18:34 EST


On Thu, Aug 21, 2008 at 11:27 AM, Maciej W. Rozycki
<macro@xxxxxxxxxxxxxx> wrote:
> On Wed, 20 Aug 2008, Rafael J. Wysocki wrote:
>
>> On my box I see many "APIC error on CPU1: 00(40)" messages that don't seem
>> to be related to anything obviously bad and I've alwas been seeing them.
>
> Barring a hardware erratum, this is a bug in the kernel. It should be
> moderately easy to track down with some debugging added to writes
> accessing LVT and redirection table entries.

Hi,

I've also seen this a lot, so I have now written (I think) such a
debug patch (it's very crude) and tested it on my laptop, which
exhibits this problem.

The patch and full dmesg (with debug output) can be found here:

http://userweb.kernel.org/~vegard/bugs/20080821-apic/

The output looks like this (with register annotations by me; CPU id is
the second column)

APIC error on CPU0: 00(40)
Last 16 APIC writes:
0: 1: [00000380] = 00001f79
1: 1: [000000b0] = 00000000
2: 1: [00000380] = 00001f7e
3: 1: [000000b0] = 00000000
4: 1: [00000380] = 00001fa5
5: 1: [000000b0] = 00000000
6: 1: [00000380] = 00001f8c
7: 1: [000000b0] = 00000000
8: 1: [000000b0] = 00000000
9: 1: [00000380] = 00001e4e
10: 1: [000000b0] = 00000000
11: 1: [00000380] = 00001fa5
12: 1: [000000b0] = 00000000
13: 1: [00000380] = 00001f87 # Initial Count Register (for Timer)
14: 0: [00000280] = 00000000 # Error Status Register
15: 0: [000000b0] = 00000000 # EOI Register

The order is from oldest (0) to newest (15) write. I don't see any
writes to ICR in there, which means that IPIs can be ruled out? It
seems that it is the write to Timer that causes it. In another place,
we have this:

13: 1: [00000320] = 000100ef # LVT Timer Register
14: 0: [00000280] = 00000000
15: 0: [000000b0] = 00000000

This would be APIC_LVT_MASKED | LOCAL_TIMER_VECTOR.

The APIC error is seen approximately every 3 minutes.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/