Re: SCSI Timeout, but no bus reset

Chiaki Ishikawa (Chiaki.Ishikawa@personal-media.co.jp)
Fri, 6 Aug 1999 02:10:16 +0900 (JST)


X-PMC-CI-e-mail-id: 11028
>>>>> "Tim" == Tim Ricketts <tr@oxlug.org> writes:

Tim> On Tue, 3 Aug 1999, Chiaki Ishikawa wrote:

>> When a faulty medium error is encountered at a certain timing on my
>> LINUX PC, the reported "reset forever" state is entered and the system
>> becomes unusable. The log line is essentially the same.

Tim> My problem seems to be that it's _not_ resetting, not that it's
Tim> resetting forever.

Thank you for clarifying.

But are you sure of this?

In my case, the driver/kernel spits out messages
to the effect it was aborting command endlessly.
It turns out that it WAS resetting the bus after all (or
resetting the devices anyhow somehow.)
I had a CD changer aside from the faulty disk on the same SCSI bus.
The CD changer
makes noise every time it is reset (as on power-boot).
So I could tell that it was reset repeatedly due to the
sound it makes when this problem appears.

Unless we have SCSI bus monitor or whatever, it is very
difficult to figure out what goes on at this level, that's for sure.

By the way, I thought that one reason that the SCSI subsystem is not
robust as it could be is that the relative lack of testing devices
that will fail reliablely.
I am not kidding.

In making software robust, sometimes we throw slightly
errorneous input to see it can handle such faulty input in a graceful
manner. (The keyword here is "robustness", not "correctness".)

While trying to help the debugging efforts for this particular
problem,
I came across a file under drivers/scsi called scsi_debug.c

The first few lines read,

/* $Id: scsi_debug.c,v 1.1 1992/07/24 06:27:38 root Exp root $
* linux/kernel/scsi_debug.c
*
* Copyright (C) 1992 Eric Youngdale
* Simulate a host adapter with 2 disks attached. Do a lot of checking
* to make sure that we are not getting blocks mixed up, and panic if
* anything out of the ordinary is seen.
*/

I wonder if anyone in the know can modify this
to run under the 2.2.10 kernel and see
if we can incoporate some random features to simulate
the "slightly longer than standard RESET pause time" and
other assorted `slightly' out of spec behavior
in order to improve the handling of error/abnormal conditions
of the SCSI system.
(Being able to use the above file as a module would be certainly a
plus.)

Obviously, the above file was used to test the SCSI subsystem to
see if it handles the properly behaving disk without mixing up
buffers, etc.. Now I think we can use this to simulate a
faulty disk and see if SCSI subsystem behaves in a graceful manner.

BTW, I suspect that linux IDE/ATAPI drivers DO handle MORE abuse from
non-complying devices than SCSI drivers today from what I have read in
the mailing list of ATA/ATAPI standard activities. Why not let SCSI,
too?

-- 
     Ishikawa, Chiaki        ishikawa@personal-media.co.jp.NoSpam  or         
 (family name, given name) Chiaki.Ishikawa@personal-media.co.jp.NoSpam
    Personal Media Corp.      ** Remove .NoSpam at the end before use **     
  Shinagawa, Tokyo, Japan 142-0051

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/