Re: oops in ioapic_write_entry

From: Eric W. Biederman
Date: Tue Aug 03 2010 - 02:00:27 EST


Dave Airlie <airlied@xxxxxxxxx> writes:

> On Tue, Aug 3, 2010 at 1:26 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>> Dave Airlie <airlied@xxxxxxxxx> writes:

>>> Okay el6log is from a RHEL6 2.6.32 kernel, but it should give a good
>>> baseline, the 2.6.35 oops even earlier with all those options and is
>>> in the second attachment.
>>
>> It appears we have a smoking gun:
>>
>> For some reason setup_IO_APIC_IRQS thinks we at least 2 io_apics,
>> but we have only setup 1 io_apic. ÂSince io_apics need a kmap entry
>> accessing an apic that hasn't been setup will definitely give a
>> page fault. ÂIt sounds like something is stomping nr_ioapics.
>>
>> From: 2.6.35-debuglog
>> IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23
>> ....
>> IOAPIC[1]: Set routing entry (0-16 -> 0x51 -> IRQ 16 Mode:1 Active:1)
>>
>> Can we get your System.map of the failing kernel (so we can see what
>> is close to nr_ioapics), and could you add a print statement in
>> arch/x86/kernel/apic/io_apic:setup_IO_APIC_irqs to print nr_ioapics?
>>
>> I would be surprised if drm changes could have affected this.
>>
>
> Okay, from my debug addition it still only seems to have one ioapic

Thanks. I goofed reading that code. I saw setup_IO_APIC_irq and made
the incorrect leap that said we came from setup_IO_APIC_irqs, when
in fact we are coming from io_apic_set_pci_routing.

So let's see can I figure out why we are getting a bad apic_id.

For that I need to track back to pirq_enable_irq, which leads
me to IO_APIC_get_PCI_irq_vector. The likely canidate is that we
simply are not finding the apicid that is present in the mp_irqs
entry that we decided to return. The patch below should add
appropriate debugging and fix the lookup

The real difference appears to be that acpi is disabled where it
is not disabled in your reference kernel.

Dave can you verify this fixes the oops for you?

It would be nice if we didn't crash early in boot even without
acpi present.

Eric

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e41ed24..e824e14 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1067,7 +1067,7 @@ static int pin_2_irq(int idx, int apic, int pin)
int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
struct io_apic_irq_attr *irq_attr)
{
- int apic, i, best_guess = -1;
+ int i, best_guess = -1;

apic_printk(APIC_DEBUG,
"querying PCI -> IRQ mapping bus:%d, slot:%d, pin:%d.\n",
@@ -1080,16 +1080,29 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
for (i = 0; i < mp_irq_entries; i++) {
int lbus = mp_irqs[i].srcbus;

- for (apic = 0; apic < nr_ioapics; apic++)
- if (mp_ioapics[apic].apicid == mp_irqs[i].dstapic ||
- mp_irqs[i].dstapic == MP_APIC_ALL)
- break;
-
if (!test_bit(lbus, mp_bus_not_pci) &&
!mp_irqs[i].irqtype &&
(bus == lbus) &&
(slot == ((mp_irqs[i].srcbusirq >> 2) & 0x1f))) {
- int irq = pin_2_irq(i, apic, mp_irqs[i].dstirq);
+ int apic;
+ int irq;
+
+ /* Lookup the ioapic by id */
+ for (apic = 0; apic < nr_ioapics; apic++)
+ if (mp_ioapics[apic].apicid == mp_irqs[i].dstapic ||
+ mp_irqs[i].dstapic == MP_APIC_ALL)
+ break;
+
+ /* Verify we found the ioapic */
+ if (apic >= nr_ioapics) {
+ printk(KERN_ERR
+ "%02x:%02x.%c: APIC_ID %u pin: %u not found BIOS bug?\n",
+ bus, slot, 'A' + pin - 1,
+ mp_irqs[i].dstapic, mp_irqs[i].dstirq);
+ continue;
+ }
+
+ irq = pin_2_irq(i, apic, mp_irqs[i].dstirq);

if (!(apic || IO_APIC_IRQ(irq)))
continue;
@@ -1099,7 +1112,8 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
mp_irqs[i].dstirq,
irq_trigger(i),
irq_polarity(i));
- return irq;
+ best_guess = irq;
+ goto out;
}
/*
* Use the first all-but-pin matching entry as a
@@ -1114,6 +1128,12 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
}
}
}
+out:
+ if (best_guess >= 0)
+ apic_printk(APIC_DEBUG,
+ "%02x:%02x.%c: IRQ %u IOAPIC: %u pin: %u",
+ bus, slot, 'A' + pin - 1,
+ best_guess, irq_attr->ioapic, irq_attr->ioapic_pin);
return best_guess;
}
EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/