On Wed, Sep 04, 2002 at 12:37:37PM +0200, Andries Brouwer wrote:
>
> The scsi error recovery has many bad properties, but one is its slowness.
This does not have to be this way. It is a solvable problem.
> Once it gets triggered on a machine with SCSI disks it is common to
> have a dead system for several minutes.
Yes, well, this too is solvable. It, in fact, reminds me that one of the
things I think needs added to the scsi host settings is a default timeout
value for typical devices. Something like adding a default_timeout value
to each Scsi_Device struct and allowing the scsi driver to modify the
value during the slave_attach() call. Then we can put the default timeout
on non-intelligent controllers to something sane while things like
MegaRAID controllers can keep their sky high timeout values.
> I have not yet met a situation
> in which rebooting was not preferable above scsi error recovery,
> especially since the attempt to recover often fails.
This, too, is solvable. It just requires that the scsi subsystem start
paying attention to *how* things fail and making the error handling code
smart enough to know when to retry things and when to just fail.
-- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Sep 07 2002 - 22:00:21 EST