Re: Panic in scsi.c

Ishikawa (ishikawa@yk.rim.or.jp)
Thu, 20 Aug 1998 14:23:07 +0900


Kurt Garloff wrote:
>
> On Wed, Aug 19, 1998 at 01:52:03AM +0100, Richard Waltham wrote:
> > Hi,
> >
> > I can generate the following panic in scsi.c at will using a CD media
> > changer - Nakamichi MBR-7.
> >
> > Happens with kernel versions 2.0.35 and 2.0.36-pre6. I haven't checked any
> > others.
> >
> > Attempt to allocate device channel 0, target 6, lun x
> > Kernel Panic: No Device found in allocate_device().
> >
> > If I start the following two commands running in different vc's
> >
> > dd if=/dev/scdX of=/dev/null (X = 1, 2 ...)
> >
> > dd if=/dev/scdY of=/dev/null (Y = 0, 1 ...)
> >
> > and the second one started has Y < X I get the panic.
> >
> > eg
> >
> > dd if=/dev/sdc1 of=/dev/null - starting this first
> >
> > dd if=/dev/scd0 of=/dev/null - then starting this
> >
> > generates the panic. Starting scd0 first and then scd1 is OK - but very
> > sloooooow as its spending most of the time changing CDs;)
> >
> > I guess the panic is caused by the call to allocate_device from
> > do_sr_request in sr.c but don't know why.
> >
> > Anyone figure it out?
>
> There have been reports that the code in scsi.c doesn't correctly honour the
> BLIST_SINGLELUN and that this causes problems with Nakamichi MBR-7(2) and
> certain host adapter settings. (Probably other devices will be affected too.)
>
> Adding the NAKAMICHI to the blacklist (BLIST_SINGLELUN) and applying the
> appended patch may help you. Chiaki Ishikawa reported successful operation
> after creating the patch.
>
> If I correctly judge what I see, this was a bug in scsi.c. Alan, is this bug
> still in 2.0.36? (I'm not sure if it's the correct fix, though. There have
> been reports about missing locking in sr.c, too ...)
>

Hello Garloff-san,

(I am not sure if I can post to linux-kernel and linux-scsi mailing
lists so if this post doesn't appear there,
will you kindly re-post or forward this e-mail to the relevant
mailing list, Garloff-san?).

Garloff-san,

Thank you for your latest e-mail informing of what goes on
linux-scsi mailing list.

Just thought things went back to normal, this is another "Aiiieee" thing!

That is, even with my patch, the problem mentioned lately persists. Ugh.

Will you kindly pass this e-mail to linux-scsi mailing list since
I am not a member yet?
(I think I will subscribe shortly until this thing gets sorted out.
I wonder if there is a digest version of the list, though.
I receive enough e-mails without linux-scsi lately...)

Background information and observation:

(0) VERSION: 2.0.35

I am using 2.0.35 with the mentioned patch to scsi.c to
handle BLIST_SINGLELUN better on this version.

Nakamichi changer is connected to Tekram dc390.
dc390 driver has been enhanced thanks to Mr. Kurt Garloff lately.
We have eliminated many bugs, I think.

(1) Reproducing the panic reported in the recent posting.

This one is reported to cause panic with Nakamichi MBR-7.

dd if=/dev/scd3 of=/dev/null &
dd if=/dev/scd2 of=/dev/null

In my case /dev/scd2 -> /dev/sr2 is the LUN=0 of Nakamichi
MBR-7 changer.

ID=6 Nakamichi changer
/dev/scd2 : LUN 0
/dev/scd3 : LUN 1
...
/dev/scd8 : LUN 6

This DID cause panic even with the mentioned patch!

From /klogd.msg (klogd daemon log file.)

<4>Attempt to allocate device channel 0, target 6, lun 0
<0>Kernel panic: No device found in allocate_device
<4>

(2) Things get more interesting.

Please recall that I had patched scsi.c as mentioned before.

According to the recent post, the following command sequence
should work. Note that the smaller LUN gets accessed first in this case.

dd if=/dev/scd2 of=/dev/null &
dd if=/dev/scd3 of=/dev/null

BUT, with the previously mentioned patch, my machine gets *locked solid*.
I had to use reset button. I looked at the syslog and klogd.msg file
to see if there were any indication of what went wrong. No luck this time.

So I think the `corrected' BLIST_SINGLELUN handling triggers
dormant bug in sr.c(?) in a worse manner.

My Conclusion:

The suggested fix for BLIST_SINGLELUN handling
DID fix a serious bug: the system got hung due to scsi timeout without
the fix. Without the fix, BLIST_SINGLELUN was not honoured during operation.
With the patch, I verified that the branches which ought to be taken
based on the single_lun setting are now taken correctly.
(In the patch, the commented out printk statement was to check this.)
The timeout error disappeared.

But that bug seems to be a different bug.

It seems there are other bugs which are triggered by
the recently reported command sequence.
These bugs are still present even witht the mentioned patch, and
the slightly different command sequence that was
reported to work can lock the system with the mentioned patch.

I think there is a bug somewhere in the mid-level scsi code.
(As I recall someone reported the single_lun handling issues on
Sparc port. My controller is Tekram dc390. So low-level driver
may not have bearing on this particular problem. But one never knows.)

Please let me know if I can be of any help in tracking down the bug.
(Like inserting printk statements here and there, etc..)

Thank you for attention in advance.

Chiaki Ishikawa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html