Re: Panic in scsi.c

Ishikawa (ishikawa@yk.rim.or.jp)
Sat, 05 Sep 1998 22:32:35 +0900


Kurt Garloff wrote:
>
> On Wed, Aug 19, 1998 at 01:52:03AM +0100, Richard Waltham wrote:
> > Hi,
> >
> > I can generate the following panic in scsi.c at will using a CD media
> > changer - Nakamichi MBR-7.
> >
> > Happens with kernel versions 2.0.35 and 2.0.36-pre6. I haven't checked any
> > others.
> >
> > Attempt to allocate device channel 0, target 6, lun x
> > Kernel Panic: No Device found in allocate_device().
> >
> > If I start the following two commands running in different vc's
> >
> > dd if=/dev/scdX of=/dev/null (X = 1, 2 ...)
> >
> > dd if=/dev/scdY of=/dev/null (Y = 0, 1 ...)
> >
> > and the second one started has Y < X I get the panic.
> >
> > eg
> >
> > dd if=/dev/sdc1 of=/dev/null - starting this first
> >
> > dd if=/dev/scd0 of=/dev/null - then starting this
> >
> > generates the panic. Starting scd0 first and then scd1 is OK - but very
> > sloooooow as its spending most of the time changing CDs;)
> >
> > I guess the panic is caused by the call to allocate_device from
> > do_sr_request in sr.c but don't know why.
> >
> > Anyone figure it out?
>
> There have been reports that the code in scsi.c doesn't correctly honour the
> BLIST_SINGLELUN and that this causes problems with Nakamichi MBR-7(2) and
> certain host adapter settings. (Probably other devices will be affected too.)
>
> Adding the NAKAMICHI to the blacklist (BLIST_SINGLELUN) and applying the
> appended patch may help you. Chiaki Ishikawa reported successful operation
> after creating the patch.
>
> If I correctly judge what I see, this was a bug in scsi.c. Alan, is this bug
> still in 2.0.36? (I'm not sure if it's the correct fix, though. There have
> been reports about missing locking in sr.c, too ...)
>

A couple of weeks ago or so, there was a mention of
reproducible system panic when the following
commands were issued agains Nakamichi MBR SCSI CD changer (The above
exchange, that is.):

dd if=/dev/scdx of=/dev/null &
dd if=/dev/scdy of=/dev/null

(x>y)

I have the same (or similar) Nakamichi SCSI
CD changer and I found that I can reproduce the
same problem on my PC. The MBR7 SCSI CD changer is connected to Tekram DC390
SCSI host adaptor card. The DC390 driver version is 1.20s2.

[By the way, in my case, the system got hung even when x< y. It could
be due to the fact my system had this "single_lun" problem patch
applied. But I don't know the real reason.]

I tried to see what is going on and could produce some
printk() messages right before the panic occurs.
This might be helpful in deducing the cause of the bug and so
I am reporting this message.

In the following session log I manually recorded ,
scd2 is the CD at id=6,lun=0, and
scd3 is the CD at id=6,lun=1 if I am not mistaken.

The kernel I tested was 2.0.36pre7.
(I obtained the patches from the site mentioned in Alan's message
quoted in Linux Weekly News site.
Can I simply run "patch -p1 < patch_for_pre7"
instead of running patch_for_pre1, then for pre2, etc. in order to
get the 2.0.36pre7 source tree?
When I tried to apply patches in sequence, I got
the dreaded "reverse patch deteced" message and after looking
at the files, I figured that
each patch can be applied to the base 2.0.35 source tree in one operation to
get to the pre-NNN status. Correct me if I am wrong here.)

In the manually recorded messages below,
we notice the duplication of message lines, etc.
This is due to the fact that
before running the problematic sequence of commands, I ran

tail -f syslog-output &
tail -f klogd-output &

on the console and su'ed to become root.
Obviously some syslog messages got sent to console (where root is) and
we got the output from the two tail commands.
I tried several times before running commands in X environment, but
nothing interesting was shown and the system hung, so I decided the
console. Funny, it didn't work all the time very well. But today, somehow
the panic occured cleanly and the messages were recorded by hand very well.

I don't know if this message
helps people in fixing the bug before 2.0.36 release, but
this bug got to be fixed somehow in 2.0.3x release, I think.

If someone wants to delve into this problem and would like me to
print more info by inserting printk() in the source files,
just let me know.

By the way, is the original reporter of this problem
using DC390 or other SCSI cards?
I don't think DC390 driver is not the cause of the problem, but
just wanted to make sure that the problem occurs with the combination of
Nakamichi SCSI CD changer and other SCSI cards.

Finally, here is the manual recording of the messages shown on the console
after I typed the problematic commands and
when the system paniced:

=== BEGIN QUOTE ===
dd if=/dev/scd2 of=/dev/null &
dd if=/dev/scd3 of=/dev/null
do_ioctl: target=3, allocate_device(NULL,05966f18,1)
do_ioctl: target=2, allocate_device(NULL,05a78198,1)
Attempt to allocate device channel 0, target 6, lun 0
Kernel panic: no device found in allocate_device. SCWait == 0

(comment: there was one blank line on the screen.)
get_sectorsize: i = 3, allocate_device(NULL, 05966f18, 1)
get_sectorsize: i = 3, allocate_device(NULL, 05966f18, 1)
do_sr_request: allocate_device(&CURRENT=0023528, (x=CURRENT->rq_dev=0xb03, y= DEVICE_NR(x)=0x03, scsi_CDs[y].device=05966f18, 0)
<4>do_ioctl: target=2, allocate_device(NULL,05a78198,1)
Sep 5 17:03:32 linux linux: do_ioctl: target=3, allocate_device(NULL,05966f18,1)
Sep 5 17:03:32 linux linux: get_sectorsize: i=3, allocate_device(NULL,05966f18,1)
Sep 5 17:03:32 linux linux: get_sectorsize: i=3, allocate_device(NULL,05966f18,1)
Sep 5 17:03:32 linux linux: do_sr_request: allocate_device(&CURRENT=0023528, (x=CURRENT->rq_dev=0xb03, y= DEVICE_NR(x)=0x03, scsi_CDs[y].device=05966f18, 0)
=== END QUOTE ===

Lucikly, the klogd file when the system paniced recorded the following
at the end. I collectedthis after the next reboot:
(syslog output was useless. The flush of the file buffer
didn't take place obviously.)

=== BEGIN QUOTE ===
... [omission ] ...
<4>Detected scsi CD-ROM sr7 at scsi1, channel 0, id 6, lun 5
<6>DC390: Queue INQUIRY command to dev ID 06 LUN 05
<6>DC390: Queue INQUIRY command to dev ID 06 LUN 00
<4>do_sr_request: allocate_device(&CURRENT=00223528, (x=CURRRENT->rq_dev=0xb05, y= DEVICE_NR(x)=0x05, scsi_CDs[y].device=00fec518) ,0)
<4>do_ioctl: target=7, allocate_device(NULL, 00fec798, 1)
<4>get_sectorsize: i=7, allocate_device(NULL,00fec798, 1)
<4>do_sr_request: allocate_device(&CURRENT=00223528, (x=CURRRENT->rq_dev=0xb07, y= DEVICE_NR(x)=0x07, scsi_CDs[y].device=00fec798) ,0)
(MY COMMENT: up until this, the insertion of the device using
echo "scsi add-single-device ..." after the DC390 driver module was
inserted produced the messages, I think.)
<4>do_ioctl: target=2, allocate_device(NULL, 05a78198, 1)
<4>Attempt to allocate device channel 0, target 6, lun 0
[EOF of klogd message file]
=== END QUOTE ===

By the way, target numbers seem to crop up in an strange order, and I
thought I got the numbers wrong first. But checking the argument to
allocate_device() convinced that my manual recording is correct.

I don't know if the problem is caused by the improper protection of
the various allocate routines in sr*.c files as mentioned in the
case of 2.1.1xx kernel lately.
But my understanding of SCSI subsystem of linux 2.0.35 is not good
enough to make any judgement now, and for that matter, producing
a patch for 2.0.35 based on the recent patch for protecting these functions
based on SMP lock/unlock functions in 2.1.1xx is beyond me now.

The output messages were produced by the insertion of printk in the
following places in the relevant files:

sr.c-673-
sr.c-674- if (flag++ == 0)
sr.c-675- {
sr.c-676- /*CIDEBUG*/
sr.c:677: printk("do_sr_request: allocate_device(&CURRENT=%08x, (x=CURRRENT->rq_dev=0x%02x, y= DEVICE_NR(x)=0x%02x, scsi_CDs[y].device=%08x) ,0) \n",
sr.c-678- &CURRENT,
sr.c-679- CURRENT->rq_dev,
sr.c-680- DEVICE_NR(CURRENT->rq_dev),
sr.c-681- scsi_CDs[DEVICE_NR(CURRENT->rq_dev)].device);
sr.c:682: SCpnt = allocate_device(&CURRENT,
sr.c-683- scsi_CDs[DEVICE_NR(CURRENT->rq_dev)].device, 0);
sr.c-684- }
sr.c-685- else SCpnt = NULL;
sr.c-686- restore_flags(flags);

--
sr.c-1067-    
sr.c-1068-    buffer = (unsigned char *) scsi_malloc(512);
sr.c-1069-
sr.c-1070-    /*CIDEBUG*/
sr.c:1071:    printk("get_sectorsize: i=%d, allocate_device(NULL,%08x, 1)\n",
sr.c-1072-           i, scsi_CDs[i].device);
sr.c-1073-
sr.c:1074:    SCpnt = allocate_device(NULL, scsi_CDs[i].device, 1);
sr.c-1075-    
sr.c-1076-    retries = 3;
sr.c-1077-    do {
sr.c-1078-	cmd[0] = READ_CAPACITY;
--
sr_ioctl.c-43-    Scsi_Cmnd * SCpnt;
sr_ioctl.c-44-    int result;
sr_ioctl.c-45-
sr_ioctl.c-46-    /* CIDEBUG*/
sr_ioctl.c:47:    printk("do_ioctl: target=%d, allocate_device(NULL, %08x, 1)\n", 
sr_ioctl.c-48-           target, scsi_CDs[target].device);
sr_ioctl.c-49-
sr_ioctl.c:50:    SCpnt = allocate_device(NULL, scsi_CDs[target].device, 1);
sr_ioctl.c-51-    {
sr_ioctl.c-52-	struct semaphore sem = MUTEX_LOCKED;
sr_ioctl.c-53-	SCpnt->request.sem = &sem;
sr_ioctl.c-54-	scsi_do_cmd(SCpnt,
--
scsi.c-1073- * request_queueable function also knows the internal allocation scheme
scsi.c-1074- * of the packets for each device 
scsi.c-1075- */
scsi.c-1076-
scsi.c:1077:Scsi_Cmnd * allocate_device (struct request ** reqp, Scsi_Device * device,
scsi.c-1078-			     int wait)
scsi.c-1079-{
scsi.c-1080-    kdev_t dev;
scsi.c-1081-    struct request * req = NULL;
--
scsi.c-1087-    Scsi_Cmnd * SCwait = NULL;
scsi.c-1088-    Scsi_Cmnd * found = NULL;
scsi.c-1089-    
scsi.c-1090-    if (!device)
scsi.c:1091:	panic ("No device passed to allocate_device(): device==NULL.\n");
scsi.c-1092-    
scsi.c-1093-    if (reqp) req = *reqp;
scsi.c-1094-    
scsi.c-1095-    /* See if this request has already been queued by an interrupt routine */
--
scsi.c-1169- 			if (!SCwait) {
scsi.c-1170-	 			printk("Attempt to allocate device channel %d,"
scsi.c-1171-                                       " target %d, lun %d\n", device->channel,
scsi.c-1172-                                       device->id, device->lun);
scsi.c:1173: 				panic("No device found in allocate_device. SCWait == 0\n");
scsi.c-1174-	 		}
scsi.c-1175- 		}
scsi.c-1176-#else	/* ORIGINAL CODE */
scsi.c-1177-		    restore_flags(flags);
--
scsi.c-1179-		    if (!SCwait) {
scsi.c-1180-			printk("Attempt to allocate device channel %d, target"
scsi.c-1181-                               " %d, lun %d\n", device->channel, device->id, 
scsi.c-1182-                               device->lun);
scsi.c:1183:			panic("No device found in allocate_device\n");
scsi.c-1184-		    }
scsi.c-1185-		    SCSI_SLEEP(&device->device_wait,
scsi.c-1186-			       (SCwait->request.rq_status != RQ_INACTIVE));
scsi.c-1187-#endif

Hope this helps.

Happy Hacking

Chiaki Ishikawa

PS: I am sorry that I only read linux-scsi mailing list...

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/faq.html