Re: Scsi tape - returns ENXIO after an error

Itai Nahshon (nahshon@actcom.co.il)
Sat, 10 Oct 1998 22:57:42 +0300


Kai M{kisara wrote:
>
> On Fri, 9 Oct 1998, Steven N. Hirsch wrote:
>
> >
> >
> > On Fri, 9 Oct 1998, Harald Koenig wrote:
> >
> > > On Oct 09, Itai Nahshon wrote:
> > >
> > > > Hello,
> > > > I got a new DDS2 tape drive (Sony SDT-7000). I'm trying to read
> > > > an old tape (probably not compatible) and I'm getting this error:
> > > >
> > > > nahshon# tar vtf /dev/st0
> > > > tar: Read error on /dev/st0: I/O error
> > > > tar: At beginning of tape, quitting now
> > > >
> > > > Later, I cannot use the tape drive:
> > > >
> > > > nahshon# tar vtf /dev/st0
> > > > tar: Cannot open /dev/st0: No such device or address
> > > >
> > > > The only way to recover is to unload and reload the st module
> > > > (if I use a modularized kernel).
> >
> > > this is the same which happens to me too using 2.1.119 to 122
> > > with SCSI tapes (DDS1 and QIC150) connected to an AHA1542,
> > > but I don't see it for DDS2 connected to NCR810.
> >
> > Am I imagining things, or isn't this a long-standing problem noted on
> > Alan's "to-do" list? I don't know if Kai Makisara reads this list, so has
> > anyone E-Mailed him directly with a report?
> >
> I do read this list, although this is not the correct list for this
> problem (SCSI problems should be discussed on linux-scsi). I don't know
> what Alan means.
>
> I have done some investigations on the problem:
>
> - ENXIO is returned from some location in st.c, but I suspect that it
> comes from this:
>
> if( !scsi_block_when_processing_errors(scsi_tapes[dev].device) ) {
> return -ENXIO;
> }
>
> Could someone of you confirm this with a printk? If this is not the case,
> please forget that I wrote anything below this.
>
> The same piece of code is used also in the other high-level SCSI drivers.
>
> - scsi_block_when_processing_error(SDpnt) (in scsi_error.c) is essentially
> 'return SDpnt->online'. This flag is set when a SCSI device is detected
> and found to respond. The flag is not set anywhere else but it is reset in
> some locations in scsi_error.c (seems to be related situations where a
> device does not respond "correctly" after a bus or device reset). This
> explains that, if the reset fails, the device is gone until the driver is
> reloaded.

I believe that's what happened.

>
> If the theory above is correct, it explains why the device "disappears"
> after SCSI reset is performed by the middle-level driver. This leaves the
> question why the reset is being attempted. The timeouts the tape driver
> uses by default are very long. However, there are other timeouts in the SCSI
> protocol (on the signal level) and it may be that in some cases certain
> drives don't respond fast enough. One common reason for SCSI timeouts are
> the cables and/or termination. This may explain some resets but in most
> cases I have heard about the cables and termination seem to be OK.
>
> Kai

There was no problem with the termination or physical connection.
I just tried to read a tape which is probably in an incompatible
format. Later I could write to that same tape and read back the
new data.

The tape with the wrong format gave a 'st0: Incorrect block size.'
kernel error message when I tried it in 2.0.36-pre13.

Itai

-- 
Itai Nahshon   nahshon@actcom.co.il
        Also   nahshon@vnet.ibm.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/