Scsi tape - returns ENXIO after an error (aha1542) [PATCH]

Richard Fish (rjf@estinc.com)
Wed, 11 Nov 1998 08:31:27 -0700


This is a multi-part message in MIME format.
--------------8D5CB9FFF7C338E2E58800CD
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Some weeks ago, there was discussion on linux-kernel about a problem
with SCSI tape drives connected to a 1542 being put offline after any
kind of error. Since it still occurs in the 2.1.127 kernel, I'm
assuming there was no fix for this.

My understanding of the problem is this:

On any failure, the 1542 driver calls the scsi error handling code,
which attempts a series of device, bus, an adapter resets and retries to
try and clear the "failure". If the command still fails, the device is
taken offline, resulting in ENXIO being returned in any subsequent
attempts to access the device. Generally, this is acceptable, because
you don't want a single mis-behaving device to hang the whole system.

But with tape drives, there are certain "failures" that should be
considered "normal" - particularly reaching EOM while writing, or a
filemark (FMK) while reading..

Assuming that my understanding of the problem is correct, I have
attached a patch that suppresses the error handling code if the sense
data indicates EOM or FMK. Please let me know if this breaks anything,
or if there is some other reason not to include it in the kernel.

-- 
Richard Fish                      Enhanced Software Technologies, Inc.
Software Developer                4014 E Broadway Rd Suite 405
rjf@estinc.com                    Phoenix, AZ  85040 
(602) 470-1115                    http://www.estinc.com
--------------8D5CB9FFF7C338E2E58800CD
Content-Type: text/plain; charset=us-ascii;
 name="aha1542-patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="aha1542-patch"

--- aha1542.c.old Wed Nov 11 08:27:11 1998 +++ aha1542.c Wed Nov 11 08:28:06 1998 @@ -18,6 +18,9 @@ * 1-Jan-97 * Modified by Bjorn L. Thordarson and Einar Thor Einarsson * Recognize that DMA0 is valid DMA channel -- 13-Jul-98 + * Modified by Richard Fish + * Suppress reset/retry error handling at EOM/EOF conditions + * 11-Nov-98 */ #include <linux/module.h> @@ -242,6 +245,8 @@ switch (hosterr) { case 0x0: case 0xa: /* Linked command complete without error and linked normally */ + break; + case 0xb: /* Linked command complete without error, interrupt generated */ hosterr = 0; break; @@ -487,12 +492,19 @@ /* is there mail :-) */ /* more error checking left out here */ - if (mbistatus != 1) - /* This is surely wrong, but I don't know what's right */ - errstatus = makecode(ccb[mbo].hastat, ccb[mbo].tarstat); - else + if (mbistatus != 1) { + /* RJF - catch for EOM/EOF -- not really an error so we don't want the error + handler to run */ + if (SCtmp->sense_buffer[2] & 0x40 || SCtmp->sense_buffer[2] & 0x80) { + errstatus = makecode(DID_PASSTHROUGH, ccb[mbo].tarstat); + } else { + /* This is surely wrong, but I don't know what's right */ + errstatus = makecode(ccb[mbo].hastat, ccb[mbo].tarstat); + } + } else { errstatus = 0; - + } + #ifdef DEBUG if(errstatus) printk("(aha1542 error:%x %x %x) ",errstatus, ccb[mbo].hastat, ccb[mbo].tarstat);

--------------8D5CB9FFF7C338E2E58800CD--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/