complete lockup (SCSI related) in pre-2.0.31-9

Simon Karpen (skarpen@scan.shodor.org)
Tue, 9 Sep 1997 12:54:42 -0400 (EDT)


I've been trying to kill the furry little kernel, and have succeeded.

System:
mix of SCSI and EIDE drives (SCSI off of BT-930)
HX based motherboard, Intel P54C/133
Tulip Ethernet, Matrox video, 48MB FPM DRAM
not much else should be relevent.
256MB swap space on the SCSI drive

running: 5 bonnie on /home (on scsi disk), and a make -j of the kernel
in /usr/src on one of the EIDE disks (/usr is on SCSI though)

after running fine for ~15 minutes, the system got what looks like
a truncated oops (running with syslog off in case of something like
what happened next)

EIP: 0023:[<08048b58>] ESP: 002b:bfffa9d8 EFLAGS: 00000216
EAX: 00000001 EBX: 40008581 ECX: 40008000 EDX: 0804aee8
ESI: bffffebe EDI: 02e77582 EBP: bffffdd8 DS: 002b ES: 002b FS: 002b GS: 002b

the EIP value isn't anywhere in the System.map for the kernel.

About 5 minutes later, the system produced a large number of scsi error
messages similar to what's below, then completely locked up. (completely =
not even vc switching worked)

note the strange PID number on the scsi 'aborting command'...

scsi: aborting command due to timeout: pid 103843, scsi0, channel 0, id 0,
lun 0 Read (10) 00 00 23 69 2e 00 00 20 00
scsi0: Aborting CCB #103879 to Target 0
scsi0: CCB #103879 to Target 0 Aborted

The system passes an over night run of memtest86, and has *never* given
any signals while compiling the kernel, so I don't think it has memory
or cache problems.

If anybody needs more information please let me know.

Simon Karpen skarpen@shodor.org
Sysadmin, Shodor Education Foundation

"On two occasions I have been asked [by members of Parliament!], `Pray,
Mr. Babbage, if you put into the machine wrong figures, will the right
answers come out?' I am not able rightly to apprehend the kind of
confusion of ideas that could provoke such a question."
-- Charles Babbage