Re: Any SMP people out there with SCSI CD ROMs?

Rolf Fokkens (rolf@flits102-126.flits.rug.nl)
Wed, 02 Sep 1998 16:17:29 +0000


Having installed a second CPU recentry and having the problem since
then, I respond to an older message.

> I've gotten lots of reports that SCSI CD's seem to be broken
currently,
> and it's almost certainly because the SCSI layer is doing something
bad
> wrt the io_request_lock under SMP with either ioctl's or just
something
> else in sr.c...

I got lots of SCSI troubles after installing a second CPU. At first I
had the impression that it was a combination of running xplaycd and vrec
(part of the sndutils):

vrec -w -t 260 -s 44100 -S -b 16 | cat > test.wav

Sooner or later my SCSI subsystem came with messages like this:

Aug 23 15:40:20 home01 kernel: scsi : aborting command due to
timeout : pid 51621, scsi0, channel 0, id 6, lun 0 UNKNOWN(0x42) 02 40
01 00 00 00 00 10 00
Aug 23 15:40:20 home01 kernel: SCSI host 0 channel 0 reset (pid
51618) timed out - trying harder
Aug 23 15:40:20 home01 kernel: SCSI bus is being reset for host 0
channel 0.
Aug 23 15:40:20 home01 kernel: SCSI host 0 abort (pid 51619) timed
out - resetting
Aug 23 15:40:20 home01 kernel: SCSI bus is being reset for host 0
channel 0.

After trying a lot of things it's clear now that I can reproduce this
without a music CD and xplaycd too! I only have to start the vrec
command and be patient (it mostly happens in the specified 260 secs).

I even had this problem (could reproduce it) when I tried to untar the
2.1.119 sources. Apparently the problem occurs when doing lots of disk
writes!

This problem happens both in 2.0.35 and 2.1.119.

Sometimes the SCSI reset results in a stable system, but I mostly get
some kind of deadlock on the buffer cache. All sync attempts the fail,
vmstat however shows that the system is still able to write to disk. I
get the impression that a block in the buffer cached is locked and stays
that way. Not being able to sync makes it impossible to unmount
filesytems so th shutdown fails. after reboot fsck often can't repair
the problem itself (I get the impression that some indirect blocks
aren't written to disk) so a manual repair is needed.

> However, having no devices except a simple disk on my SCSI setup, I
don't
> have much to look at. Does anybody out there feel comfortable about
> spinlocks and have a CD-ROM drive on their SCSI subsystem? I'd
appreciate
> a hand with this.. (it's probably trivial to fix once you find the
> offender)

I am not afraid of spinlocks, but I doubt if I know enough of the kernel
to be helpful. But hey, we can try! I can reproduce the problem anyhow!

> Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html