Re: 1.3.97

Simon Shapiro (Shimon@i-connect.net)
Thu, 09 May 1996 10:25:25 -0700 (PDT)


Hi "Ulrich Windl"; On 09-May-96 you wrote:

...

> > > I have no problems with my 2940.
>
> Sorry, but your message seemed to be very negative, not to say
> disaster-like.

> > The problem is not with the 2940. It is in the strained relationships
> > between the AHA-2940W and the Yamaha CDR-100. I solved it by building
> > a kernel that does not probe for LUNS. Said combination does NOT like it.

> ... which again is no problem of the Adaptec driver per se, but a
> problem of the CDR-100 (broken multiple LUN support). Maybe you post
> a message again telling about your discoveries. There's a "blacklist"
> in the code that will handle these broken devices.

Context Orientation: I posted a message about some sad experience with the
AHA2940. As Suggested, here is my discovery:

1. Except as detailed below, the aic7xxx driver is fine.

2. The Yamaha CDR100, Firmware 1.10 is ``slightly'' broken:

a. If you probe for more LUN's than 0 (zero) it hangs.
Bus reset will clear it, sometimes. Reboot will clear most of the
time.
b. If you try to access it for READ as a normal CD device it will lock
up. Badly. Only power cycle will get it out of the coma.

3. I have not tried to use cdwrite on it yet. Stay tuned.

4. Yamaha is shipping me the relevant manuals. I will post patches, notes
etc. once I know more than now (zilch).

5. SCSI problem: I have beating this horse until it died. then i beat the
saddle. Unfortunately, the problem did NOT go away, despite a lot of
effort by many very good people. For the curious, here is the typical
scenario:

a. A SCSI command is issued.
b. scsi.c (?) decides the command timed-out. 99.9999% of the time this is
false and nonsense. Exceptions? See Yamaha above. Leonard knows the
HP disks story - broken firmware.
c. command_abort is sent to the HBA.
d. Without FAIL, the abort will timeout.
e. A complete bus reset will be issued, destroying any and all queued
IOOPs.
f. The HBA driver will perform the reset and return to scsi_reset.
g. In 1-2/3 cases the system will freeze at this point. No error messages,
no log output. Nothing.

Several people, myself included have pored over the code. No ``Oops, here is
the bug...'' was found. The system appears broken by design.
IMHU, based on some 20 years of I/O and kernel design, I agree. I am taking
upon myself to completely re-design the SCSI layer. I have some distinct
ideas, but eagerly want to hear yours.

Please send me (privately) your comments, ideas, etc. Try to focus on
logical, conceptual and behavioral issues, not on implementation details.
Lines of code are the quickest and cheapest part of any software project.

I really would like to take this thread off-line as Linux-kernel is way
overburdened with traffic as it is.

Thanx for lending a friendly ear. Love you all.

Sincerely Yours, (Sent on 05/09/96, 10:25:25 by XF-Mail)

Simon Shapiro
Director of Technology i-Connect.Net, a Division of iConnect Corp.
Shimon@i-Connect.Net 13455 SW Allen Blvd., Suite 140 Beaverton OR 97008