Re: SCSI problems still there

Stephen Davies (scldad@sdc.com.au)
Wed, 26 Feb 97 19:59:52 +0930


Thanks to the people who replied to my appeal for help.

I must apologise for one aspect of my details: checking back it turns out
I didn't ever try to do a backup of the SCSI disk using 2.0.27 or 2.0.28.

In fact, I am not sure exactly which was the last 2.0.? that I did try:
all I can confirm from the wreckage is that 2.0.15 did not work and
produced many reset errors before hanging the system and destroying my
file systems. (It takes a lot of time and effort to recover from these
failures and it is a production system that I am trying to maintain.)

So somewhere between 2.0.15 and 2.0.29 changes were made that stopped
the reset errors plus chaos and replaced them with silence and chaos
and somewhere very shortly after 1.2.13 something was changed that
introduced the problem in the first place.

Somebody suggested increasing the reset timing values. Any more votes
for this?

Cheers and thanks again,
Stephen.

> From: Michael Thomas <mike@fasolt.mtcc.com>
> Date: 25 Feb 1997 12:15:10 -0800
>
> > If you look at the changes in 2.0.29, you will see that there have been
no
> > changes to the SCSI subsystem nor to the Adaptec 1542 driver that would
> > explain this lockup problem. In fact, 2.0.29 contains very few changes
of
> > any kind.
>
> If this is the same problem, it would tend to
> implicate either the generic SCSI code or st.c
> since I'm getting similar unhappiness on the
> BusLogics controller. I've been send Kai mail with
> as much info as I can dig up about my
> configuration and what I'm actually seeing. If you
> know of anything else which might be suspicious,
> I'd be happy to help track it down.
>
>If 2.0.28 works fine and 2.0.29 fails, and there are no changes to the SCSI
>code that would explain such a failure, then I would not assume that it is
a
>bug in either the common SCSI code or SCSI tape code. Logically it must
either
>be a latent bug in the SCSI code that's been uncovered by some other change,
or
>a bug in some other code that just happens to manifest as a SCSI problem.
>
>I would suggest that the people seeing this problem try to determine which
>kernel version introduces the problem, and then see if it can be tracked
down
>to a specific change. There's not much those of us who do not see the
problem
>can do to debug it until we have more information to go on. I recently had
to
>do just such a binary search over the 2.0.x kernels to find out which one
>introduced the major slowdown with buffer handling in e2fsck.
>
> Leonard

========================================================================
Stephen Davies Consulting P/L scldad@sdc.com.au
Adelaide, South Australia. Voice: 61-8-2728863
Computing & Network solutions. Fax : 61-8-2741015