Re: Kernel panic: 2.1.121 with SCSI DAT drive

Harald Koenig (koenig@tat.physik.uni-tuebingen.de)
Mon, 14 Sep 1998 13:40:40 +0200


On Sep 14, Kai M{kisara wrote:

> I am sending this reply just to you because this is a "wild guess".

very good guess (so Cc: to linux-kernel) !

> You may or may not have noticed that 2.1.121 does contain the new
> scatter/gather code in st.c. This may have changed some things but I don't
> want to blame the new code directly. It does not use scsi_free().

haven't noticed this yet because I was stuck with 2.1.119 without your patches
which didn't work for my own PC...

> Looking at aha1542.c there is something suspicuous: it does at several
> places the following:
>
> if (SCtmp->host_scribble) scsi_free(SCtmp->host_scribble, 512);
>
> (i.e., frees the buffer if the pointer is non-null but does not set it to
> null afterwards). This may lead to trying to free the same buffer twice
> although I don't understand the code enough to speculate how it does this.
> It might be interesting to modify the code so that it sets the pointer to
> null after freeing the buffer and see if the panic disappears.

absolutely correct, the patch below fixed the kernel panic. hanks!

> Another interesting question is why the driver is trying bus device reset
> but even if it tries, it should not panic.

absolutely correct, too;)

so here again is what happens now with patched aha1542 module:

# dd if=/dev/nst1 bs=64k count=1 of=T1
aha1542.c: Trying device reset for target 3
Sent BUS RESET to scsi host 1
st1: Error with sense data: extra data not valid Current error st09:01: sense key Unit Attention
Additional sense indicates Power on, reset, or bus device reset occurred
dd: /dev/nst1: I/O error
0+0 records in
0+0 records out
# mtst stat
/dev/nst1: No such device or address
# rmmod st
st: Unloaded.
# insmod st
st: bufsize 32768, wrt 30720, max buffers 4, s/g segs 16.
Detected scsi tape st0 at scsi1, channel 0, id 2, lun 0
Detected scsi tape st1 at scsi1, channel 0, id 3, lun 0
# mtst stat
/dev/nst1: I/O error
# mtst stat
/dev/nst1: I/O error
# mtst rewi
# mtst stat
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x13 (DDS (61000 bpi)).
Soft error count since last status=0
General status bits on (45010000):
BOT WR_PROT ONLINE IM_REP_EN

so I still can't read from DAT tapes with 2.1.121...

additional notes:
- the SCSI reset occurs _immediately_ when trying to read (I can hear
scsi reset when the QIC150 re-initializes)
- at the first reset, the whole system locks up for 1-2 seconds (e.g.
gpm or X cursor won't move anymore), at the 2nd reset there is no lockup.

any other pointers or wild guesses ?


Harald

--- linux-2.1.121/drivers/scsi/aha1542.c.old Sat Sep 12 00:48:17 1998
+++ linux-2.1.121/drivers/scsi/aha1542.c Mon Sep 14 13:05:05 1998
@@ -479,7 +479,10 @@
}

my_done = SCtmp->scsi_done;
- if (SCtmp->host_scribble) scsi_free(SCtmp->host_scribble, 512);
+ if (SCtmp->host_scribble) {
+ scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
+ }

/* Fetch the sense data, and tuck it away, in the required slot. The
Adaptec automatically fetches it, and there is no guarantee that
@@ -1229,6 +1232,7 @@
if (SCtmp->host_scribble)
{
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
}

HOSTDATA(SCpnt->host)->SCint[i] = NULL;
@@ -1295,6 +1299,7 @@
if (SCtmp->host_scribble)
{
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
}

HOSTDATA(SCpnt->host)->SCint[i] = NULL;
@@ -1367,6 +1372,7 @@
if (SCtmp->host_scribble)
{
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
}

HOSTDATA(SCpnt->host)->SCint[i] = NULL;
@@ -1505,7 +1511,10 @@
Scsi_Cmnd * SCtmp;
SCtmp = HOSTDATA(SCpnt->host)->SCint[i];
SCtmp->result = DID_RESET << 16;
- if (SCtmp->host_scribble) scsi_free(SCtmp->host_scribble, 512);
+ if (SCtmp->host_scribble) {
+ scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
+ }
printk("Sending DID_RESET for target %d\n", SCpnt->target);
SCtmp->scsi_done(SCpnt);

@@ -1552,7 +1561,10 @@
Scsi_Cmnd * SCtmp;
SCtmp = HOSTDATA(SCpnt->host)->SCint[i];
SCtmp->result = DID_RESET << 16;
- if (SCtmp->host_scribble) scsi_free(SCtmp->host_scribble, 512);
+ if (SCtmp->host_scribble) {
+ scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
+ }
printk("Sending DID_RESET for target %d\n", SCpnt->target);
SCtmp->scsi_done(SCpnt);

--
All SCSI disks will from now on                     ___       _____
be required to send an email notice                0--,|    /OOOOOOO\
24 hours prior to complete hardware failure!      <_/  /  /OOOOOOOOOOO\
                                                    \  \/OOOOOOOOOOOOOOO\
                                                      \ OOOOOOOOOOOOOOOOO|//
Harald Koenig,                                         \/\/\/\/\/\/\/\/\/
Inst.f.Theoret.Astrophysik                              //  /     \\  \
koenig@tat.physik.uni-tuebingen.de                     ^^^^^       ^^^^^

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/faq.html