Re: Scsi tape - returns ENXIO after an error (aha1542) [PATCH]

Harald Koenig (koenig@tat.physik.uni-tuebingen.de)
Thu, 12 Nov 1998 11:45:20 +0100


On Nov 11, Richard Fish wrote:

> Some weeks ago, there was discussion on linux-kernel about a problem
> with SCSI tape drives connected to a 1542 being put offline after any
> kind of error. Since it still occurs in the 2.1.127 kernel, I'm
> assuming there was no fix for this.

thanks! I just tried your patch with 2.1.126 (since 2.1.127 frequently
locks up for me in scheduler). it seems to fix only half of the problem,
but the more important half;)

the failure at FMK is gone, but reading at EOM still resets the bus:

I'm using a QIC150 drive with only one tar archive stored on the tape.
the first read using

tar tvRi

works fine, stops at EOF with no error nor bus reset. but when I try
to read again using the same tar command (there might be a 2nd archive
on the tape) I still get a scsi bus reset. here is the debug output
of st driver:

st0: Block limits 512 - 512 bytes.
st0: Mode sense. Length 11, medium 6, WBS 10, BLL 8
st0: Density 0, tape length: 0, drv buffer: 1
st0: Block size: 512, buffer size: 32768 (64 blocks).
st0: Block limits 512 - 512 bytes.
st0: Mode sense. Length 11, medium 6, WBS 10, BLL 8
st0: Density 0, tape length: 0, drv buffer: 1
st0: Block size: 512, buffer size: 32768 (64 blocks).
st0: Error: 8000002, cmd: 8 1 0 0 40 0 Len: 32768
FMK Current error st09:00: sense key None
Additional sense indicates Filemark detected
st0: Sense: f0 0 80 0 0 0 3c 16
st0: EOF detected (2048 bytes read).
st0: EOF up (1). Left 2048, needed 2048.
st0: EOF/EOM flag up (1). Bytes 0
st0: Block limits 512 - 512 bytes.
st0: Mode sense. Length 11, medium 6, WBS 10, BLL 8
st0: Density 10, tape length: 0, drv buffer: 1
st0: Block size: 512, buffer size: 32768 (64 blocks).
st0: Block limits 512 - 512 bytes.
st0: Mode sense. Length 11, medium 6, WBS 10, BLL 8
st0: Density 10, tape length: 0, drv buffer: 1
st0: Block size: 512, buffer size: 32768 (64 blocks).
st0: Got tape pos. blk 5063 part 0.
st0: Block limits 512 - 512 bytes.
st0: Mode sense. Length 11, medium 6, WBS 10, BLL 8
st0: Density 10, tape length: 0, drv buffer: 1
st0: Block size: 512, buffer size: 32768 (64 blocks).
st0: EOF/EOM flag up (2). Bytes 0
aha1542.c: Trying device reset for target 2
Sent BUS RESET to scsi host 1
st0: Error: 8000002, cmd: 8 1 0 0 40 0 Len: 32768
extra data not valid Current error st09:00: sense key Unit Attention
Additional sense indicates Power on, reset, or bus device reset occurred
st0: Sense: 70 0 6 0 0 0 0 16
st0: Tape error while reading.

> My understanding of the problem is this:
>
> On any failure, the 1542 driver calls the scsi error handling code,
> which attempts a series of device, bus, an adapter resets and retries to
> try and clear the "failure". If the command still fails, the device is
> taken offline, resulting in ENXIO being returned in any subsequent
> attempts to access the device. Generally, this is acceptable, because
> you don't want a single mis-behaving device to hang the whole system.
>
> But with tape drives, there are certain "failures" that should be
> considered "normal" - particularly reaching EOM while writing, or a
> filemark (FMK) while reading..
>
> Assuming that my understanding of the problem is correct, I have
> attached a patch that suppresses the error handling code if the sense
> data indicates EOM or FMK. Please let me know if this breaks anything,
> or if there is some other reason not to include it in the kernel.
>
> --
> Richard Fish Enhanced Software Technologies, Inc.
> Software Developer 4014 E Broadway Rd Suite 405
> rjf@estinc.com Phoenix, AZ 85040
> (602) 470-1115 http://www.estinc.com
> --- aha1542.c.old Wed Nov 11 08:27:11 1998
> +++ aha1542.c Wed Nov 11 08:28:06 1998
> @@ -18,6 +18,9 @@
> * 1-Jan-97
> * Modified by Bjorn L. Thordarson and Einar Thor Einarsson
> * Recognize that DMA0 is valid DMA channel -- 13-Jul-98
> + * Modified by Richard Fish
> + * Suppress reset/retry error handling at EOM/EOF conditions
> + * 11-Nov-98
> */
>
> #include <linux/module.h>
> @@ -242,6 +245,8 @@
> switch (hosterr) {
> case 0x0:
> case 0xa: /* Linked command complete without error and linked normally */
> + break;
> +
> case 0xb: /* Linked command complete without error, interrupt generated */
> hosterr = 0;
> break;
> @@ -487,12 +492,19 @@
> /* is there mail :-) */
>
> /* more error checking left out here */
> - if (mbistatus != 1)
> - /* This is surely wrong, but I don't know what's right */
> - errstatus = makecode(ccb[mbo].hastat, ccb[mbo].tarstat);
> - else
> + if (mbistatus != 1) {
> + /* RJF - catch for EOM/EOF -- not really an error so we don't want the error
> + handler to run */
> + if (SCtmp->sense_buffer[2] & 0x40 || SCtmp->sense_buffer[2] & 0x80) {
> + errstatus = makecode(DID_PASSTHROUGH, ccb[mbo].tarstat);
> + } else {
> + /* This is surely wrong, but I don't know what's right */
> + errstatus = makecode(ccb[mbo].hastat, ccb[mbo].tarstat);
> + }
> + } else {
> errstatus = 0;
> -
> + }
> +
> #ifdef DEBUG
> if(errstatus) printk("(aha1542 error:%x %x %x) ",errstatus,
> ccb[mbo].hastat, ccb[mbo].tarstat);

Harald

--
All SCSI disks will from now on                     ___       _____
be required to send an email notice                0--,|    /OOOOOOO\
24 hours prior to complete hardware failure!      <_/  /  /OOOOOOOOOOO\
                                                    \  \/OOOOOOOOOOOOOOO\
                                                      \ OOOOOOOOOOOOOOOOO|//
Harald Koenig,                                         \/\/\/\/\/\/\/\/\/
Inst.f.Theoret.Astrophysik                              //  /     \\  \
koenig@tat.physik.uni-tuebingen.de                     ^^^^^       ^^^^^

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/