I have had two incidents in which my system has hung under 2.2.14
using an Adaptec 1542CF that I just installed, replacing the old
Adaptec 1520A I had in there before.
First incident:
- I was in 'screen' ssh'd to this headless box, so I was able to
look about a bit even though the disk was giving me IO errors
every time I tried to access something on the disk
- The last thing logged to syslog (cached, but not written, as
I later found out upon a reboot) was a SCSI bus reset
- After my investigation, I switched over a monitor & kbd to this
box so I could see anything logged to the console .. unfortunately
it had scrolled and so entirely consisted of ext2 complaining that
it couldn't find certain inodes and had remounted the single
filesystem on this box (/ on /dev/sda1) read-only
- I saw no kernel oops during the first incident, but that's not
saying it didn't happen, as the console messages had scrolled
- Upon reboot, the e2fsck ran clean
Second incident (this morning):
- This time I immediately switched the monitor & kbd to the box
to see if I could spot anything in the console messages
- I saw the tail end of a kernel oops ... but with only partial
data and no way to capture it, I couldn't do anything with it
- I had no way of getting at syslog to see if there was a SCSI
reset
- Upon reboot, there were some e2fsck errors that needed manual
intervention, but nothing terribly damaging
- I saw no evidence of a SCSI bus reset this time, but that's not
saying it didn't happen ... with no way to access syslog before
the reboot, it could very well have been the same as in the first
incident (i.e. the bus resets, it gets logged, but not written
to disk before the system is rebooted)
Combing the list archives for the past little while, I did notice
a patch introduced in 2.2.14 (which I verified was applied in this
version) that alleges to fix a kernel oops when the SCSI bus resets.
I wonder if that patch solves the whole problem.
I'd give more info if I could. Unfortunately, the nature of my
problem (the box has a single ext2 filesystem which is immediately
remounted read-only when the crash occurs, preventing logging of
useful messages) makes it rather difficult to collect data. If
someone has tips as to how I can get some useful information, I'll
be happy to give it a try.
Finally, as to what might be causing the SCSI bus resets, I don't know
... perhaps the cat is playing with my temptingly dangly external SCSI
cable or the flimsy power cord for the scanner. I could try unhooking
the SCSI scanner from the box and setting termination on the card to
eliminate that as a possibility. Nevertheless, a bus reset shouldn't
cause my system to completely hang and require a reboot, should it?
Ben
-- nSLUG http://www.nslug.ns.ca synrg@sanctuary.nslug.ns.ca Debian http://www.debian.org synrg@debian.org Chebucto http://www.chebucto.ns.ca aa458@chebucto.ns.ca [ pgp key fingerprint = 7F DA 09 4B BA 2C 0D E0 1B B1 31 ED C6 A9 39 4F ]- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:29 EST