RE: [BUG] Fusion MPT Base Driver initialization failure with kdum p

From: Bharata B Rao
Date: Wed Jul 13 2005 - 05:48:43 EST


On Tue, 2005-07-12 at 12:15 -0600, Moore, Eric Dean wrote:
> I've seen the report. I need more info from Bharata on how
> to reproduce. Perhaps you can send me email offline which
> provides specific instructions to how to configure kdump,
> how to capture the dump, and what you did to crash your system.

This is how I could reproduce this bug:
(http://bugzilla.kernel.org/show_bug.cgi?id=4870)

- Configure the 1st kernel to take a kdump and run 4 instances of LTP on
it (I used `runltp -p -l logfile -t 12h -x 4)
- After a few hours, the 1st kernel crashes which results in the booting
of the 2nd kernel (the kernel which captures kdump). While this is
booting, mptbase driver fails during initialization leading to the panic
of the 2nd kernel.

The instructions to configure the 1st and the 2nd kernel to enable kdump
is found in Documentation/kdump/kdump.txt. I used the exact instructions
therein.

I am pasting the complete bug report from bugzilla here:

Distribution:
RedHat 9

Hardware Environment:
4 CPU Intel Xeon machine

lspci output:

00:00.0 Host bridge: ServerWorks CMIC-LE (rev 13)
00:00.1 Host bridge: ServerWorks CMIC-LE
00:00.2 Host bridge: ServerWorks: Unknown device 0000
00:01.0 Bridge: IBM Remote Supervisor Adapter (RSA)
00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev
05)
00:0f.3 ISA bridge: ServerWorks GCLE Host Bridge
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 05)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 05)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 05)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 05)
02:01.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 0c)
02:02.0 Ethernet controller: Intel Corp.: Unknown device 1079 (rev 03)
02:02.1 Ethernet controller: Intel Corp.: Unknown device 1079 (rev 03)
02:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703
Gigabit
Ethernet (rev 02)
05:03.0 Ethernet controller: Intel Corp.: Unknown device 1079 (rev 03)
05:03.1 Ethernet controller: Intel Corp.: Unknown device 1079 (rev 03)
05:07.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev
07)
05:07.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev
07)
07:05.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
09:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 08)

Software Environment:
Running 2.6.13-rc1 as the capture kernel.

Problem Description:

When the 2nd(capture) kernel boots (to capture the dump after the base
kernel
crashed), the Fusion MPT driver initialization cause a system panic.

Steps to reproduce:

- Configure the system for kdump
- Allow the base kernel to crash
- To capture the dump, the 2nd kernel boots and panics during the
inialization
phase of MPT driver.

--------log----------
Linux version 2.6.13-rc1 (root@xxxxxxxxxxxxxxxx) (gcc version 3.2.2
20030222
(Red Hat Linux 3.2.2-
5)) #1 Fri Jul 8 19:19:49 IST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 000000000009d400 (usable)
BIOS-e820: 000000000009d400 - 00000000000a0000 (reserved)
BIOS-e820: 0000000000100000 - 00000000effe7f40 (usable)
BIOS-e820: 00000000effe7f40 - 00000000effef800 (ACPI data)
BIOS-e820: 00000000effef800 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
user-defined physical RAM map:
user: 0000000000000000 - 00000000000a0000 (usable)
user: 0000000001000000 - 000000000148d000 (usable)
user: 000000000152d400 - 0000000005000000 (usable)
0MB HIGHMEM available.
80MB LOWMEM available.
found SMP MP-table at 0009d540
DMI 2.3 present.
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
Processor #6 15:2 APIC version 20
WARNING: NR_CPUS limit of 1 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
WARNING: NR_CPUS limit of 1 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
Processor #7 15:2 APIC version 20
WARNING: NR_CPUS limit of 1 reached. Processor ignored.
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15
ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[16])
IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 16-31
ACPI: IOAPIC (id[0x0c] address[0xfec02000] gsi_base[32])
IOAPIC[2]: apic_id 12, version 17, address 0xfec02000, GSI 32-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
Enabling APIC mode: Flat. Using 3 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 05000000 (gap: 05000000:fb000000)
Built 1 zonelists
Kernel command line: root=/dev/sda5 console=tty0 console=ttyS0,38400
init 1
irqpoll memmap=exactmap
memmap=640K@0K memmap=4660K@16384K memmap=60235K@21685K
elfcorehdr=21684K
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Initializing CPU#0
PID hash table entries: 512 (order: 9, 8192 bytes)
Detected 2395.326 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 59980k/81920k available (2923k kernel code, 5368k reserved,
1203k data,
204k init, 0k highm
em)
Checking if this processor honours the WP bit even in supervisor mode...
Ok.
Calibrating delay using timer specific routine.. 4803.25 BogoMIPS
(lpj=9606513)
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
CPU: Intel(R) Xeon(TM) CPU 2.40GHz stepping 07
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfd7cc, last bus=10
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
ACPI: PCI Root Bridge [PCI1] (0000:02)
PCI: Probing PCI hardware (bus 02)
ACPI: PCI Root Bridge [PCI2] (0000:05)
PCI: Probing PCI hardware (bus 05)
ACPI: PCI Root Bridge [PCI3] (0000:07)
PCI: Probing PCI hardware (bus 07)
ACPI: PCI Root Bridge [PCI4] (0000:09)
PCI: Probing PCI hardware (bus 09)
ACPI: PCI Interrupt Link [LP00] (IRQs *3)
ACPI: PCI Interrupt Link [LP01] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP02] (IRQs *10)
ACPI: PCI Interrupt Link [LP03] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP04] (IRQs *11)
ACPI: PCI Interrupt Link [LP05] (IRQs *10)
ACPI: PCI Interrupt Link [LP06] (IRQs *3)
ACPI: PCI Interrupt Link [LP07] (IRQs *10)
ACPI: PCI Interrupt Link [LP08] (IRQs *9)
ACPI: PCI Interrupt Link [LP09] (IRQs *5)
ACPI: PCI Interrupt Link [LP0A] (IRQs *10)
ACPI: PCI Interrupt Link [LP0B] (IRQs *9)
ACPI: PCI Interrupt Link [LP0C] (IRQs *9)
ACPI: PCI Interrupt Link [LP0D] (IRQs *11)
ACPI: PCI Interrupt Link [LP0E] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP0F] (IRQs *11)
ACPI: PCI Interrupt Link [LP10] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP11] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP12] (IRQs *5)
ACPI: PCI Interrupt Link [LP13] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP14] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP15] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP16] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP17] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP18] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP19] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1A] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1B] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1C] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1D] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1E] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1F] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LPUS] (IRQs *11)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 14 devices
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a
report
pnp: 00:00: ioport range 0x900-0x93f has been reserved
pnp: 00:00: ioport range 0x510-0x517 could not be reserved
pnp: 00:00: ioport range 0x504-0x507 could not be reserved
pnp: 00:00: ioport range 0x500-0x503 could not be reserved
pnp: 00:00: ioport range 0x520-0x53f has been reserved
pnp: 00:00: ioport range 0x420-0x427 has been reserved
pnp: 00:00: ioport range 0x460-0x461 has been reserved
pnp: 00:0b: ioport range 0x1ec-0x1ef has been reserved
pnp: 00:0b: ioport range 0x400-0x4fe could not be reserved
pnp: 00:0b: ioport range 0x600-0x600 has been reserved
pnp: 00:0b: ioport range 0x800-0x80f has been reserved
pnp: 00:0b: ioport range 0xc00-0xcfe could not be reserved
pnp: 00:0b: ioport range 0xf50-0xf58 has been reserved
Machine check exception polling timer started.
audit: initializing netlink socket (disabled)
audit(1121077976.640:1): initialized
Installing knfsd (copyright (C) 1996 okir@xxxxxxxxxxxx).
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
ACPI: Power Button (FF) [PWRF]
lp: driver loaded but no devices found
Linux agpgart interface v0.101 (c) Dave Jones
[drm] Initialized drm 1.0.0 20040925
PNP: PS/2 controller has invalid data port 0x64; using default 0x60
PNP: PS/2 controller has invalid command port 0x60; using default 0x64
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing
disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP(,...)]
p0: using parport0 (interrupt-driven).
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
e100: Intel(R) PRO/100 Network Driver, 3.4.8-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 16
e100: eth0: e100_probe: addr 0xfbfff000, irq 16, MAC addr
00:03:47:E0:67:78
ACPI: PCI Interrupt 0000:09:06.0[A] -> GSI 31 (level, low) -> IRQ 17
e100: eth1: e100_probe: addr 0xf53ff000, irq 17, MAC addr
00:02:55:50:04:D6
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
hda: LTN486S, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 48X CD-ROM drive, 120kB Cache
Uniform CD-ROM driver Revision: 3.20
PCI: Enabling device 0000:07:05.0 (0156 -> 0157)
ACPI: PCI Interrupt 0000:07:05.0[A] -> GSI 24 (level, low) -> IRQ 18
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
<Adaptec 29160B Ultra160 SCSI adapter>
aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

Fusion MPT base driver 3.03.02
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.03.02
ACPI: PCI Interrupt 0000:05:07.0[A] -> GSI 27 (level, low) -> IRQ 19
mptbase: Initiating ioc0 bringup
Unable to handle kernel paging request at virtual address 00001e08
printing eip:
c11d5c1f
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in:
CPU: 0
EIP: 0060:[<c11d5c1f>] Not tainted VLI
EFLAGS: 00010006 (2.6.13-rc1)
EIP is at mptscsih_io_done+0x1f/0x2f0
eax: c1794000 ebx: 00000000 ecx: 00000000 edx: 00000250
esi: 00000000 edi: 00000000 ebp: 00001e00 esp: c1617d08
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, threadinfo=c1616000 task=c1601a00)
Stack: c160f151 c111007c 00000000 c11153a3 0000006c 00000250 c1794000
00000000
c1794000 00000000 0000000e c11cff4f c17b3a00 00000000 00001e00
c16067c0
c1401380 00000013 00000001 c103e0bb 00000000 00000000 c1617dc0
00000000
Call Trace:
[<c111007c>] acpi_os_release_lock+0xd/0x2b
[<c11153a3>] acpi_ev_gpe_detect+0xde/0x101
[<c11cff4f>] mpt_interrupt+0xcf/0x1a0
[<c103e0bb>] misrouted_irq+0x16b/0x190
[<c103e23f>] note_interrupt+0x9f/0xc0
[<c103da3e>] __do_IRQ+0x12e/0x140
[<c1005723>] do_IRQ+0x23/0x40
[<c1003c22>] common_interrupt+0x1a/0x20
[<c101c24d>] release_console_sem+0x6d/0x100
[<c101c07a>] vprintk+0x18a/0x270
[<c125aa78>] pci_conf1_read+0xa8/0x100
[<c125c23c>] pci_read+0x2c/0x40
[<c110001f>] kobject_get+0xf/0x20
[<c101bee7>] printk+0x17/0x20
[<c11d101f>] mpt_do_ioc_recovery+0x4f/0x4f0
[<c11d09ee>] mpt_attach+0x2ae/0x5d0
[<c11dabb2>] mptspi_probe+0x12/0x400
[<c1108d26>] pci_device_probe_static+0x46/0x60
[<c1108d77>] __pci_device_probe+0x37/0x50
[<c1108db6>] pci_device_probe+0x26/0x50
[<c11665d3>] driver_probe_device+0x33/0xa0
[<c11666e9>] __driver_attach+0x39/0x40
[<c1165e55>] bus_for_each_dev+0x45/0x60
[<c1166706>] driver_attach+0x16/0x20
[<c11666b0>] __driver_attach+0x0/0x40
[<c116621b>] bus_add_driver+0x7b/0xa0
[<c1108eb0>] pci_device_shutdown+0x0/0x20
[<c110900b>] pci_register_driver+0x6b/0x90
[<c140a85b>] do_initcalls+0x2b/0xc0
[<c1065325>] kern_mount+0x15/0x17
[<c1000290>] init+0x0/0x100
[<c10002bf>] init+0x2f/0x100
[<c10013b8>] kernel_thread_helper+0x0/0x18
[<c10013bd>] kernel_thread_helper+0x5/0x18
Code: 41 04 e9 53 ff ff ff 0f 0b eb ce 90 55 89 d5 57 56 89 ce 53 83 ec
1c 89 44
24 18 8b 90 c0 00
00 00 81 c2 50 02 00 00 89 54 24 14 <0f> b7 4d 08 89 4c 24 10 8b 42 0c
8b 1c 88
85 db 0f 84 90 02 0
0
<0>Kernel panic - not syncing: Fatal exception in interrupt
----------log end---------------


------- Additional Comment #1 From Vivek Goyal 2005-07-11 04:14 -------

This seems to be the typical case of driver hardening required for kdump
cases.
MPT driver does not seem to be equipped to receive interrupts from
device while
driver is initializing. This works fine if driver is built with
MPT_DEBUG_IRQ
defined.


Note: CCing to fastboot as this is related to kdump.

Regards,
Bharata.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/