Re: 2.6.28-rc2: REGRESSION in early boot

From: Yinghai Lu
Date: Tue Nov 04 2008 - 19:18:03 EST


On Tue, Nov 4, 2008 at 4:14 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> On Tue, Nov 4, 2008 at 2:45 PM, Theodore Tso <tytso@xxxxxxx> wrote:
>> I've opened a Kernel Bug to track this regression:
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=11951
>>
>> My fileserver boots under 2.6.27, but it is failing to boot on
>> 2.6.28-rc2. It took me a while to bisect, so after I finished the
>> bisection, I retested with the latest mainline
>> (v2.6.28-rc3-54-g75fa677), and the problem still shows up.
>>
>> Essentially, the system panics in early boot, resulting in multiple
>> oops. I finally was able get the very first oops, and the image of
>> that oops can be found here:
>>
>> http://thunk.org/tytso/2.6.27-regress/92b29b8/IMG_0331.JPG
>>
>> From the console snapshot, it looks like two CPU simultaneously
>> OOPS'ed with a:
>>
>> BUG: unable to handle kernel NULL dereference at 00000000
>> BUG: unable to handle kernel NULL dereference at 00000038
>>
>> On the stack is "scheduler_tick+0x83/0x15f"
>>
>> When doing a bisection, the last good commit (i.e., the last one which
>> I can boot on my system) is git id: d6c88a50 (which preceeds 2.6.28-rc1).
>>
>> The first bad git ID is:
>>
>> commit d6c88a507ef0b6afdb013cba4e7804ba7324d99a
>> Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Date: Wed Oct 15 15:27:23 2008 +0200
>>
>> genirq: revert dynarray
>>
>> Revert the dynarray changes. They need more thought and polishing.
>>
>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>
>> ... but in fact, the failure is different from the above messages.
>> The failure is also in early boot, but the oops message is quite
>> different:
>
> please check http://lkml.org/lkml/2008/11/4/431
>
root cause

[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x03] address[0xfec80000] gsi_base[24])
[ 0.000000] IOAPIC[1]: apic_id 3, version 32, address 0xfec80000, GSI 24-47
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec80400] gsi_base[48])
[ 0.000000] IOAPIC[2]: apic_id 4, version 32, address 0xfec80400, GSI 48-71
[ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfec84000] gsi_base[72])
[ 0.000000] IOAPIC[3]: apic_id 5, version 32, address 0xfec84000, GSI 72-95
[ 0.000000] ACPI: IOAPIC (id[0x08] address[0xfec84400] gsi_base[96])
[ 0.000000] IOAPIC[4]: apic_id 8, version 32, address 0xfec84400, GSI 96-119
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Enabling APIC mode: Flat. Using 5 I/O APICs

so 120 * 2 = 240 > 224 so if you have msi card....out of irq_desc array...

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/