On Wed, 6 Mar 1996, Simon Shapiro wrote:
> Hi,
>
> I am posting to these two groups as the solution is somewhere in between :-)
> This problem has been going on for a long time. I sort of learned to live
> with it as no one seemed to be able/interested in helping me solve it.
>
> 1. Stage: 1.3.71, 64MB RAM, eata_dma and/or Buslogic, etc.
> 2. Action: cd /some_big_filesystem;
> find . -print | cpio -dmpv /another_big_partition
> You can replace this with tar | tar, etc.
> 3. Result: SCSI bus reset in a loop due to timeout on one
> disk or another (typically the same disk).
> 4. Pre-Mature Conclusion: Bad disk
> 5. Action: Replace disk == no difference.
> Run rs on ``bad disk'' == no errors
> dd if=/dev/bad_disk of=/dev/null bs=64k == No
> errors.
> 6. Action: find /bad_fs -print | cpio -C 65536 -O /dev/rmt0
> 7. Result: No Errors.
> 8. Action: Reduce blocking size == disk is still ``bad''
> 9. Desperate Action: Boot 1.3.35
> 10. Result: NO FAILURES!!!
> 11. Conclusion: 1.3.71 is broken.
>
> These also happen with 68, 64, and few others.
>
> I lied a bit. I know how to crash 1.3.35 the same way:
>
> a. cpio to tape with blocking of 1MB.
> b. do the cpio -dmp from one RAID-[01] DPT partition to another on a P5-90.
>
> These are NOT bugs in the eata_dma driver, nor the BusLogic driver (unless
> they are both bad exactly the same way - hard to swallow).
> The bus reset comes from a layer above the HBA. Different HBA's react
> differently but the result is the same:
>
> FAST SCSI I/O ON LINUX IS IMPOSSIBLE WITHOUT CRASHES.
> LARGE BLOCK I/O ON LINUX IS IMPOSSIBLE WITHOUT CRASHES
>
> We had this problem for a long time. I posted it several times. I can
> repeat and reproduce it any time.
>
> The problem cannot be reproduced by random seeking, dd or any other trivial
> (but useless) method. It van only be reproduced by doing the type of fast
> copy I am describing above. A clue can be probably found in the fact that
> backup to tape never crashes, rs (wich does random read) never crashes,
> dd to /dev/null never crashes. It always crashes on the WRITE side,
> seemingly to the same drive.
>
> The error is always an infinite loop of
> ``SCSI: resetting host scsi[01] due to target n''
>
> It is always as a result of a ``timeout''. There is no way to kill it,
> sync never returns, umount never returns, df never returns. Therefore
> shutdown never completes.
>
> These symptoms are consistent for ANY SCSI error; The enless loop, the death
> of sync, etc. Even when the hardware has a real problem. Disks mainly.
>
> SCSI tape failures typically just leave the process hung and abort.
> At times, the process will keep a disk file open and refuse to die,
> but the I/O subsystem is still alive and not a death trap.
>
> I think it would be nice if we could fix it somehow. I do not have the time
> to see my family, but will try and help as much as I can. I just do not know
> the SCSI code nearly well enough.
>
>
> Sincerely Yours,
> (Sent on 03/06/96, 00:01:23)
> Simon Shapiro i-Connect.Net, a Division of iConnect Corp.
> Shimon@i-Connect.Net 13455 SW Allen Blvd., Suite 140 Beaverton OR 97008
>
>