Bug in SCSI driver in 2.0.35 (possibly AHA-1542?)

Evan Harris (eharris@puremagic.com)
Sat, 7 Nov 1998 01:21:32 -0600 (CST)


I just had a major oops with the kernel in a hard loop trying to reset a
device. Using stock 2.0.35 (no third party patches) and an Adaptec 1542.

The kernel printed this sequence over and over every few seconds. I
couldn't get it to stop. Luckily my root partition was on another
controller (2940UW). I tried to dynamically remove the offending device,
but it locked up that console. I tried turning the device off, but the
messages and non-responsiveness continued. I tried unplugging the device
from the chain (it was the only device on that controller) but the
messages continued. Only thing that worked was to reboot.

scsi : aborting command due to timeout : pid 3067872, scsi0, channel 0, id 5, lun 0 Test Unit Ready 00 00 00 00 00
SCSI host 0 abort (pid 3067872) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
Sent BUS DEVICE RESET to target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
scsi : aborting command due to timeout : pid 3067872, scsi0, channel 0, id 5, lun 0 Test Unit Ready 00 00 00 00 00
SCSI host 0 abort (pid 3067872) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
Sent BUS DEVICE RESET to target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5
Sending DID_RESET for target 5

I believe that this was initially started by a tape drive I was
controlling with mt, but it could have buggy firmware, but needless to
say, I don't think that the kernel should go into a hard loop. At the
very least, when the device disappeared when I turned it off, it should
have given up after a few tries.

I can't reproduce this, as this is the first time it has happened to me,
but it is probably worth looking into the code to remove that hard loop.

Evan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/