Re: Core scsi layer crashes in 2.6.8.1
From: Anton Blanchard
Date: Tue Oct 05 2004 - 06:56:11 EST
Hi James,
> These state transition warnings are currently expected in this code
> (they're basically verbose warnings).
>
> What was the oops?
>
> I have a theory that we should be taking a device reference before
> waking up the error handler, otherwise host removal can race with error
> handling.
Did this get sorted out? Here is an oops from a few week old BK tree.
FYI I just noticed I have disabled host reset in the sym2 driver (it
was locking up at the time and I never went back to work out why).
However, even with a host reset this could happen right?
Below we get a WARN_ON then an oops (the bit starting with NIP, the
address we tried to access was 0x100510.
Anton
sym0: <1010-66> rev 0x1 at pci 0004:03:01.0 irq 87
sym.0004:03:01.0: No NVRAM, ID 7, Fast-80, LVD, parity checking
xics_enable_irq 47 buid 4 gqirm 255
sym.0004:03:01.0: SCSI BUS has been reset.
scsi0 : sym-2.1.18j
Using anticipatory io scheduler
sym.0004:03:01.0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M
Type: Direct-Access ANSI SCSI revision: 03
sym.0004:03:01.0:10:0: tagged command queuing enabled, command queue depth 16.
scsi(0:0:10:0): Beginning Domain Validation
sym.0004:03:01.0:10: asynchronous.
sym.0004:03:01.0:10: wide asynchronous.
sym.0004:03:01.0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
scsi(0:0:10:0): Ending Domain Validation
sym.0004:03:01.0:11:0:phase change 2-7 6@01050368 resid=5.
sym.0004:03:01.0:11:0:phase change 2-3 6@01050368 resid=5.
sym.0004:03:01.0:11: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: DEVICE RESET operation started.
sym.0004:03:01.0:11:0: DEVICE RESET operation complete.
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: BUS RESET operation started.
sym.0004:03:01.0:11:0: BUS RESET operation complete.
sym.0004:03:01.0: SCSI BUS reset detected.
sym.0004:03:01.0: SCSI BUS has been reset.
scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 11 lun 0
Badness in kref_get at lib/kref.c:32
Call Trace:
[c0000025fe1b3bd0] [c0000030262bf4b0] 0xc0000030262bf4b0 (unreliable)
[c0000025fe1b3c50] [c00000000021f5b8] .get_device+0x20/0x3c
[c0000025fe1b3cc0] [c000000000294c60] .scsi_device_get+0x38/0xe4
[c0000025fe1b3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000025fe1b3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000025fe1b3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000025fe1b3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
sym.0004:03:01.0:11:control msgout: c.
NIP: C000000000294C48 XER: 0000000020000000 LR: C000000000294E30
REGS: c0000025fe1b3a40 TRAP: 0300 Not tainted (2.6.9-rc2-bml)
MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000100510, DSISR: 0000000040000000
TASK: c000001dfe1b72c0[1467] 'scsi_eh_0' THREAD: c0000025fe1b0000 CPU: 3
GPR00: FFFFFFFFFFFFFFFA C0000025FE1B3CC0 C0000000007297B8 00000000001000F0
GPR04: C000000FFE185000 0000000000000001 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000100100 C000000000B96228 9000000000009032
GPR12: 0000000024FFFF22 C000000000542700 0000000000000000 0000000000000000
GPR16: 0000000000000000 C00000000040D188 C000000000587058 C0000025FE1B3ED0
GPR20: 00000000000000FC C00000000040D188 C000000000587058 C0000025FE1B3F00
GPR24: C0000025FE1B3EF0 0000040100000000 C000002FFE128D30 C000000FFE185000
GPR28: 9000000000009032 C0000007FFFCF800 00000000001002D8 00000000001000F0
NIP [c000000000294c48] .scsi_device_get+0x20/0xe4
LR [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
Call Trace:
[c0000025fe1b3cc0] [c000000000294da8] .scsi_device_put+0x9c/0xc4 (unreliable)
[c0000025fe1b3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000025fe1b3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000025fe1b3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000025fe1b3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/