Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.)

From: Linda Walsh (law@sgi.com)
Date: Tue Feb 01 2000 - 13:11:35 EST


David Ford wrote:
> As the subject of lost interrupts is brought up again, I feel like pitching
> in again :)
> If DMA is enabled in the kernel compile, the last message will instead be:
>
> hda: lost interrupt
>
> and the laptop will lock up.

---

I don't know if this is related or not...but thought I would report it. System 2xPIII, 1, then 2 SCSI HD's.

I've been running off of one SCSI disk on the 2.2.14 kernel for a while now -- no problem. Yesturday I mounted a 2nd SCSI hard to do builds on. I was remotely logged in. I loaded it up with a large project and did a make. Then I remotely logged in on another window. It just hung there. Tried a third window. It hung too. So of course both of the login attempts would be accessing sda -- the make was accessing sdb. Finally logged in after long wait -- tried a 'vi' on a file on sdb in one window and a 'top' in the other -- both hung. Tried to kill the make -- that wouldn't work -- just ignored me -- finally suspened and issued kill -- that took forever. Went to console and screen was in power save off -- at first unresponsive. Then it turned on the screen to a 'frozen' screen saver (Matrix). Machine was still pingable, but login attempts would time out. Ping times were down in the .2 range, but occasionally there would be a ping time of over 1400ms, followed by a 2nd at 200+ms, then back to .2. Had to power-cycle machine to reboot. Lots of messed up files in /var + /tmp (sda) and on my build disk (sdb). So there were lots of outstanding writes that were needing completion.

Had to manually fsck 2 of the partitions (/var + /tmp) -- when it came back up I looked at the log -- the last thing before the reboot was: -------- Feb 1 08:22:32 ishtar -- MARK -- Feb 1 08:39:51 ishtar in.rlogind[8354]: connect from 192.111.23.234 Feb 1 08:39:52 ishtar pam_rhosts_auth[8354]: allowed to law@nimue.corp.sgi.com as law Feb 1 08:40:09 ishtar su: (to root) law on /dev/pts/3 Feb 1 08:43:17 ishtar su: (to root) law on /dev/pts/3 Feb 1 09:02:32 ishtar -- MARK -- Feb 1 09:04:43 ishtar kernel: scsi : aborting command due to timeout : pid 98419, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 1a ff 00 00 80 00 Feb 1 09:04:43 ishtar kernel: scsi : aborting command due to timeout : pid 98420, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 1b 7f 00 00 50 00 Feb 1 09:04:44 ishtar kernel: SCSI host 0 abort (pid 98420) timed out - resetting Feb 1 09:04:44 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 09:04:44 ishtar kernel: Feb 1 09:04:44 ishtar kernel: wait_on_bh, CPU 0: Feb 1 09:04:44 ishtar kernel: irq: 0 [0 0] Feb 1 09:04:44 ishtar kernel: bh: 1 [0 1] Feb 1 09:04:47 ishtar kernel: <[c010a4b5]> <[c0165e42]> <[c0165eac]> <[c0165ebd]> <[c01746c5]> <[c0157c4f]> <[c013112b]> <[c0131616]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:04:47 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb 1 09:05:17 ishtar kernel: scsi : aborting command due to timeout : pid 98431, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 14 87 00 00 18 00 Feb 1 09:05:17 ishtar kernel: scsi : aborting command due to timeout : pid 98435, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 00 57 00 00 28 00 Feb 1 09:05:18 ishtar kernel: SCSI host 0 abort (pid 98431) timed out - resetting Feb 1 09:05:18 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 09:05:19 ishtar kernel: Feb 1 09:05:18 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 09:05:19 ishtar kernel: Feb 1 09:05:19 ishtar kernel: wait_on_bh, CPU 0: Feb 1 09:05:19 ishtar kernel: irq: 0 [0 0] Feb 1 09:05:19 ishtar kernel: bh: 1 [0 1] Feb 1 09:05:22 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]> <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:05:22 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb 1 09:05:52 ishtar kernel: scsi : aborting command due to timeout : pid 98449, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 9c 13 07 00 00 80 00 Feb 1 09:05:52 ishtar kernel: scsi : aborting command due to timeout : pid 98450, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 9c 13 87 00 00 80 00 Feb 1 09:05:52 ishtar kernel: SCSI host 0 abort (pid 98449) timed out - resetting Feb 1 09:05:52 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 09:05:53 ishtar kernel: Feb 1 09:05:53 ishtar kernel: wait_on_bh, CPU 1: Feb 1 09:05:53 ishtar kernel: irq: 0 [0 0] Feb 1 09:05:53 ishtar kernel: bh: 1 [1 0] Feb 1 09:05:56 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]> <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:05:57 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb 1 09:07:02 ishtar kernel: scsi : aborting command due to timeout : pid 98706, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 29 17 00 00 50 00 Feb 1 09:07:02 ishtar kernel: scsi : aborting command due to timeout : pid 98707, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 11 3f 00 00 08 00 Feb 1 09:07:02 ishtar kernel: SCSI host 0 abort (pid 98707) timed out - resetting Feb 1 09:07:02 Feb 1 09:07:02 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 09:07:03 ishtar kernel: Feb 1 09:07:03 ishtar kernel: wait_on_bh, CPU 0: Feb 1 09:07:03 ishtar kernel: irq: 0 [0 0] Feb 1 09:07:03 ishtar kernel: bh: 1 [0 1] Feb 1 09:07:06 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]> <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:07:06 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb 1 09:07:36 ishtar kernel: scsi : aborting command due to timeout : pid 98721, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 1d 3f 00 00 80 00 Feb 1 09:07:36 ishtar kernel: scsi : aborting command due to timeout : pid 98722, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 1d bf 00 00 80 00 Feb 1 09:07:38 ishtar kernel: SCSI host 0 abort (pid 98722) timed out - resetting Feb 1 09:07:38 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 09:07:41 ishtar kernel: (scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:07:41 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb 1 09:08:11 ishtar kernel: scsi : aborting command due to timeout : pid 98776, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 1c 1f 00 00 80 00 Feb 1 09:08:11 ishtar kernel: scsi : aborting command due to timeout : pid 98777, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 1c 9f 00 00 80 00 Feb 1 09:08:13 ishtar kernel: SCSI host 0 abort (pid 98776) timed out - resetting Feb 1 09:08:13 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 09:08:13 ishtar kernel: Feb 1 09:08:13 ishtar kernel: wait_on_bh, CPU 0: Feb 1 09:08:13 ishtar kernel: irq: 0 [0 0] Feb 1 09:08:13 ishtar kernel: bh: 1 [0 1] Feb 1 09:08:13 ishtar kernel: irq: 0 [0 0] Feb 1 09:08:13 ishtar kernel: bh: 1 [0 1] Feb 1 09:08:16 ishtar kernel: <[c010a4b5]> <[c0165e42]> <[c0165eac]> <[c0165ebd]> <[c01746c5]> <[c0157c4f]> <[c013112b]> <[c0131616]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:08:16 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb 1 09:08:46 ishtar kernel: scsi : aborting command due to timeout : pid 98800, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 44 19 5f 00 00 80 00 Feb 1 09:08:46 ishtar kernel: scsi : aborting command due to timeout : pid 98801, scsi0, channel 0, id 1<...then the log was corrupted by a mail message appearing in the middle of the log. >

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Feb 07 2000 - 21:00:06 EST